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Preface 



The Third International Workshop on Experimental and Efficient Algorithms 
(WEA 2004) was held in Angra dos Reis (Brazil), May 25-28, 2004. 

The WEA workshops are sponsored by the European Association for Theore- 
tical Computer Science (EATCS) . They are intended to provide an international 
forum for researchers in the areas of design, analysis, and experimental evalua- 
tion of algorithms. The two preceding workshops in this series were held in Riga 
(Latvia, 2001) and Ascona (Switzerland, 2003). 

This proceedings volume comprises 40 contributed papers selected by the 
Program Committee along with the extended abstracts of the invited lectures 
presented by Richard Karp (University of California at Berkeley, USA), Giuseppe 
Italiano (University of Rome “Tor Vergata”, Italy), and Christos Kaklamanis 
(University of Patras, Greece). 

As the organizer and chair of this wokshop, I would like to thank all the 
authors who generously supported this project by submitting their papers for 
publication in this volume. I am also grateful to the invited lecturers, who kindly 
accepted our invitation. 

For their dedication and collaboration in the refereeing procedure, I would 
like also to express my gratitude to the members of the Program Committee: 
E. Amaldi (Italy), J. Blazewicz (Poland), V.-D. Cung (France), U. Derigs (Ger- 
many), J. Diaz (Spain), M. Gendreau (Canada), A. Goldberg (USA), P. Hansen 
(Canada), T. Ibaraki (Japan), K. Jansen (Germany), S. Martello (Italy), C.C. 
McCeoch (USA), L.S. Ochi (Brazil), M.G.C. Resende (USA), J. Rolim (Swit- 
zerland), S. Skiena (USA), M. Sniedovich (Australia), C.C. Souza (Brazil), P. 
Spirakis (Greece), D. Trystram (France), and S. Voss (Germany). I am also gra- 
teful to the anonymous referees who assisted the Program Committee in the 
selection of the papers to be included in this publication. 

The idea of organizing WEA 2004 in Brazil grew out of a few meetings with 
Jose Rolim (University of Geneva, Switzerland). His encouragement and close 
collaboration at different stages of this project were fundamental for the success 
of the workshop. The support of EATCS and Alfred Hofmann (Springer- Verlag) 
were also appreciated. 

I am thankful to the Department of Computer Science of Universidade Fe- 
deral Fluminense (Niteroi, Brazil) for fostering the environment in which this 
workshop was organized. I am particularly indebted to Simone Martins for her 
invaluable support and collaboration in the editorial work involved in the pre- 
paration of the final camera-ready copy of this volume. 



Angra dos Reis (Brazil), May 2004 



Celso C. Ribeiro (Chair) 
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Abstract. The multiprocessor scheduling problem consists in schedul- 
ing a set of tasks with known processing times into a set of identical 
processors so as to minimize their makespan, i.e., the maximum process- 
ing time over all processors. We propose a new heuristic for solving the 
multiprocessor scheduling problem, based on a hybrid heuristic to the bin 
packing problem. Computational results illustrating the effectiveness of 
this approach are reported and compared with those obtained by other 
heuristics. 



1 Introduction 

Let T = {Ti, . . . , T„} be a set of n tasks with processing times U, i = 1, . . . ,n, 
to be processed by a set P = {Pi,... ,Pm} of m > 2 identical processors. 
We assume the processing times are nonnegative integers satisfying ti > t 2 > 

. . . > tn- Each processor can handle at most one task at any given time and 
preemption is not possible. We denote by Aj the set formed by the indices of 
the tasks assigned to processor Pj and by t{Pj) = overall processing 

time, j = 1, . . . , n. A solution is represented by the lists of tasks assigned to each 
processor. The makespan of a solution S = (Ai, . . . , Am) is given by C'i„ax(«S') = 
niaxj=i^,..,mt(Pj)- 

The multiprocessor scheduling problem PUCmax consists in finding an opti- 
mal assignment of tasks to processors, so as to minimize their makespan, see 
e.g. [5,16,20]. PllCmax is NP-hard [4,14]. We denote the optimal makespan by 
Minimizing the schedule length is important since it leads to the maxi- 
mization of the processor utilization factor [3]. 

There is a duality relation [5,17,18,24] between P|]C'max and the bin packing 
problem (BP), which consists in finding the minimum number of bins of a given 
capacity C which are necessary to accommodate n items with weights fi, . . . An- 

The worst case performance ratio r(H) of a given heuristic H for P|]C'max is 
defined as the maximum value of the ratio H(/)/C^ax(-^) O'^^r all instances I, 
where C^ax(^) is the optimal makespan of instance I and H(J) is the makespan 
of the solution computed by heuristic H. The longest processing time (LPT) 
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heuristic of Graham [15] finds an approximate solution to PjlCmax in time 
0{nlogn + nlogm), with r(LPT) = 4/3 — l/3m. The MULTIFIT heuristic 
proposed by Coffman et al. [6] explores the duality relation with the bin packing 
problem, searching by binary search the minimum processing time (or bin ca- 
pacity) such that the solution obtained by the FFD heuristic [19,21] to pack the 
n tasks (or items) makes use of at most m processors (or bins). It can be shown 
that if MULTIFIT is applied k times, then it runs in time 0(n log n -I- knlogm) 
and r (MULTIFIT) = 1.22 -|- 2“^. Friesen [13] subsequently improved this ra- 
tio to 1.20 -I- 2“^. Yue [25] further improved it to 13/11, which is tight. Finn 
and Horowitz [9] proposed the 0/ 1-INTERCHANGE heuristic running in time 
0(n log m), with worst case performance ratio equal to 2. Variants and exten- 
sions of the above heuristics can be found in the literature. 

The duality relation between the bin packing problem and U|]C'niax was also 
used by Alvim and Ribeiro [1] to derive a hybrid improvement heuristic to the 
former. This heuristic is explored in this paper in the context of PUCmax- Lower 
and upper bounds used by the heuristic are described in Section 2. The main 
steps of the hybrid improvement heuristic to multiprocessor scheduling are pre- 
sented in Section 3. Numerical results illustrating the effectiveness of the pro- 
posed algorithm are reported in Section 4. Concluding remarks are made in the 
last section. 



2 Lower and Upper Bounds 

The lower bound L\ = max { Pi] , maxi=i^,.. proposed by Mc- 

Naughton [22] establishes that the optimal makespan cannot be smaller than 
the maximum between the average processing time over all processors and 
the longest duration over all tasks. This bound can be further improved to 
L2 = max{Li, Pm +Pm+l}- 

Dell’Amico and Martello [7] proposed the lower bound L3. They also showed 
that the combination of lower and upper bounds to the makespan makes it 
possible to derive lower and upper bounds to the number of tasks assigned to 
each processor, leading to a new lower bound L^. Bounds ^2,^3, and are 
used in the heuristic described in the next section. 

We used three construction procedures for building feasible solutions and 
computing upper bounds to P\\Cmax- 

- Construction heuristic HI: Originally proposed in [1,2], it is similar to the 
Multi-Subset heuristic in [7]. It considers one processor at a time. The longest 
yet unassigned task is assigned to the current processor. Next, assign to this 
same processor a subset of the yet unassigned tasks such that the sum of their 
processing times is as close as possible to a given limit to the makespan. The 
polynomial-time approximation scheme MTSS(3) of Martello and Toth [21] 
is used in this step. The remaining unassigned tasks are considered one by one 
in non-increasing order of their processing times. Each of them is assigned 
to the processor with the smallest load. 
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- Construction heuristic H2: Hochbaum and Shmoys [17] proposed a new ap- 
proach to constructing approximation algorithms, called dual approximation 
algorithms. The goal is to find superoptimal, but infeasible, solutions. They 
showed that finding an e-approximation to PjlCmax is equivalent to find- 
ing an e-dual-approximation to BP. For the latter, an e-dual-approximation 
algorithm constructs a solution in which the number of bins is at most the 
optimal number of bins and each bin is filled with at most 1 -I- e (bin capacity 
C = 1 and item weights U scaled by UjC). In particular, they proposed a 
scheme for e = 1/5. Given a lower bound L and an upper bound U to 

we obtain by binary search the smallest value C such that L < C < U and 
the 1/5-dual-approximation solution to BP uses no more than m bins. This 
approach characterizes a 1/5 -I- 2“^-approximation algorithm to PjlCmax, 
where k is the number of iterations of the search. At the end of the search, 
the value of C gives the Lhs lower bound to the makespan. 

- Construction heuristic H3: This is the longest processing time heuristic 
LPT [15]. Tasks are ordered in non-increasing order of their processing times. 
Each task is assigned to the processor with the smallest total processing time. 

Franga et al. [10] proposed algorithm 3-PHASE based on the idea of balanc- 
ing the load between pair of processors. Hiibscher and Glover [18] also explored 
the relation between P|]C'max and BP, proposing a tabu search algorithm using 2- 
exchanges and influential diversification. Dell’Amico and Martello [7] developed 
a branch-and-bound algorithm to exactly solve P]]C'max- They also obtained new 
lower bounds. Scholl and Voss [24] considered two versions of the simple assembly 
line balancing problem. If the precedence constraints are not taken into account, 
these two versions correspond to BP and P|]Cniax- Fatemi-Ghomi and Jolai- 
Ghazvini [8] proposed a local search algorithm using a neighborhood defined by 
exchanges of pairs of tasks in different processors. Frangioni et al. [12] proposed 
new local search algorithms for the minimum makespan processor scheduling 
problem, which perform multiple exchanges of jobs among machines. The latter 
are modelled as special disjoint cycles and paths in a suitably defined improve- 
ment graph. Several algorithms for searching the neighborhood are suggested and 
computational experiments are performed for the case of identical processors. 

3 Hybrid Improvement Heuristic to P\\Cmay. 

The hybrid improvement heuristic to PlJCmax is described in this section. The 
core of this procedure is formed by the construction, redistribution, and improve- 
ment phases, as illustrated by the pseudo-code of procedure C+R+I in Figure 1. 
It depends on two parameters: the target makespan Target and the maximum 
number of iterations Maxiterations performed by the tabu search procedure 
used in the improvement phase. The loop in lines 1-8 makes three attempts to 
build a feasible solution S to the bin packing problem defined by the processing 
times ti,i = 1, . . . ,n, with bin capacity Target, using exactly m bins. Each of 
the heuristics HI, H2, and H3 is used at each attempt in line 2. If S is feasible to 
the associated bin packing problem, then it is returned in line 3. Otherwise, load 
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procedure C+R+l(Target, Maxiterations); 

1 for fc = 1, 2, 3 do 

2 Build a solution S = {Ai, . . . , Am} to P||Cmax using heuristic Hfc; 

3 if C'max(>S') < Target then return S\ 

4 S -h- Redistribution(S'); 

5 if C'max(S') < Target then return S; 

6 S' TabuSearch(S', Maxiterations); 

7 if Cmax(S) < Target then return S; 

8 end 

9 return S; 
end C+R+I 



Fig. 1. Pseudo-code of the core C+R+I procedure. 



redistribution is performed in line 4 to improve processor usability and the mod- 
ified solution S is returned in line 5 if it is feasible to the bin packing problem. 
Finally, a tabu search procedure is applied in line 6 as an attempt to knock down 
the makespan of the current solution and the modified solution S is returned in 
line 7 if it is feasible to the bin packing problem. Detailed descriptions of the 
redistribution and improvement phases are reported in [1]. 

The pseudo-code of the complete hybrid improvement heuristic HUCmax 
to P\\Cmax is given in Figure 2. An initial solution S is built in line 1 using 
heuristic H3. The lower bound L 2 is computed in line 2. If the current lower 
and upper bounds coincide, then solution S is returned in line 3. The lower 
bound L3 is computed in line 4 and the current lower bound is updated. If the 
current lower and upper bounds coincide, then solution S is returned in line 
5. The lower bound is computed in line 6 and the current lower bound is 
updated. If the current lower and upper bounds coincide, then solution S is 
returned in line 7. A new solution S' is built in line 8 using heuristic H2. The 
currently best solution and the current upper bound are updated in line 9, while 
the current lower bound is updated in line 10. If the current lower and upper 
bounds coincide, then the currently best solution S is returned in line 11. A new 
solution S' is built in line 12 using heuristic HI. The currently best solution 
and the current upper bound are updated in line 13. If the current lower and 
upper bounds coincide, then the currently best solution S is returned in line 14. 
At this point, UB is the upper bound associated with the currently best known 
solution S to HllCmax and LB is an unattained makespan. The core procedure 
C+R+I makes an attempt to build a solution with makespan equal to the current 
lower bound in line 15. The currently best solution and the current upper bound 
are updated in line 16. If the current lower and upper bounds coincide, then 
the currently best solution S is returned in line 17. The loop in lines 18-23 
implements a binary search strategy seeking for progressively better solutions. 
The target makespan Cmax = [{LB + UB)/2\ is set in line 19. Let S' be the 
solution obtained by the core procedure C+R+I applied in line 20 using Cmax as 
the target makespan. If its makespan is at least as good as the target makespan 





A Hybrid Bin-Packing Heuristic to Multiprocessor Scheduling 5 



C'max, then the current upper bound UB and the currently best solution S are 
updated in line 21. Otherwise, the unattained makespan LB is updated in line 
22, since the core procedure C+R+I was not able to find a feasible solution with 
the target makespan. The best solution found S is returned in line 24. 

The core procedure C+R+I is applied at two different points: once in line 
15 using the lower bound LB as the target makespan and in line 20 at each 
iteration of the binary search strategy using C'max as the target makespan. This 
implementation follows the same EBS (binary search with prespecified entry 
point) scheme suggested in [24]. Computational experiments have shown that it 
is able to find better solutions in smaller computation times than other variants 
which do not explore the binary search strategy or do not make a preliminary 
attempt to build a solution using LB as the target makespan. 

4 Computational Experiments 

All computational experiments were performed on a 2.4 GHz AMD XP machine 
with 512 MB of RAM memory. 



procedure HI_PCmax(MaxIterations); 

1 Compute a solution S using heuristic H3 and set UB <r- C'max (<5); 

2 Compute the lower bound L 2 and set LB t— L 2 ; 

3 if LB = UB then return S'; 

4 Compute L3 using binary search in the interval [LB, UB] and set LB t— L3; 

5 if LB = UB then return S; 

6 Compute L^ using LB and UB and update LB t— ma.x{LB, L^}; 

7 if LB = UB then return S; 

8 Compute a solution S' and the lower bound Lhs using heuristic H2; 

9 if Cmax(S') < UB then set UB ^ Cms.AS') and S ^ S'; 

10 Update LB t— max{Li3, 

11 if LB = UB then return S 

12 Compute a solution S' using heuristic HI; 

13 if Cmax(S') < UB then set UB ^ Cmax(S') and S ^ S'; 

14 if LB = UB then return S; 

15 S' •(— C+R+I(LB, Maxiterations); 

16 if Cmax(S') < UB then set UB ^ Cmax(S') and S ^ S'; 

17 if LB = UB then return S; 

18 while LB < UB — 1 do 

19 Cmax^ L(^S + CB)/2J; 

20 S' t— C+R+l(Cmax, Maxiterations); 

21 if Cmax(S') < Cmax then set UB ■(— Cmax(S') and S t— S'; 

22 else LB ^ C'max; 

23 end 

24 return S; 
end HI_PCmax 



Fig. 2. Pseudo-code of the hybrid improvement procedure to P||Cmax. 
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Algorithms HIJPCmax and LPT were coded in C and compiled with ver- 
sion 2.95.2 of the gcc compiler with the optimization flag -03. The maxi- 
mum number of iterations during the tabu search improvement phase is set 
as Maxiterations = 1000. We compare the new heuristic HUCmax with the 
3-PHASE heuristic of Franga et al. [10], the branch-and-bound algorithm B&B 
of Dell’Amico and Martello [7], and the ME multi-exchange algorithms of Fran- 
gioni et al. [12]. The code of algorithm B&B [7] was provided by Dell’Amico and 
Martello. 



4.1 Test Problems 

We considered two families of test problems: uniform and non-uniform. In these 
families, the number of processors m takes values in {5, 10, 25} and the number of 
tasks n takes values in {50, 100, 500, 1000} (the combination m = 5 with n = 10 
is also tested). The processing times ti,i = 1, ... ,n are randomly generated in 
the intervals [1,100], [1,1000], and [1,10000]. Ten instances are generated for 
each of the 39 classes defined by each combination of m, n and processing time 
interval. 

The two families differ by the distribution of the processing times. The in- 
stances in the first family were generated by Franga et al. [10] with processing 
times uniformly distributed in each interval. The generator developed by Fran- 
gioni et al. [11] for the second family was obtained from [23]. For a given interval 
[a, 6] of processing times, with a = 1 and b £ (100, 1000, 10000}, their generator 
selects 98% of the processing times from a uniform distribution in the interval 
[0.9(6 — a), 6] and the remaining 2% in the interval [a, 0.2(6 — a)]. 

4.2 Search Strategy 

In the first computational experiment, we compare three different search meth- 
ods that can be used with HFPCmax. 

In all three methods the core procedure C+R+I is used to check whether there 
exists a solution with a certain target makespan in the interval [LB^UB]. In 
the lower bound method (LBM), we start the search using LB as the target 
makespan, which is progressively increased by one. In the binary search method 
(BSM), the interval [LB, UB] is progressively bisected by setting [{LB+UB)/2\ 
as the target makespan. The binary search with prespecifled entry point method 
(EBS) is that used in the pseudo-code in Figure 2. It follows the same general 
strategy of BSM, but starts using LB as the first target makespan. 

Table 1 presents the results observed with each search method. For each 
family of test problems, we report the following statistics for the 130 instances 
with the same interval for the processing times: the total computation time 
in seconds, the maximum and the average number of executions of the core 
procedure C+R+I. These results show that EBS is consistently better than the 
other methods: the same solutions are always found by the three methods, with 
significantly smaller total times and fewer executions of C+R+I in the case of 
EBS. 
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Table 1. Search strategies: LBM, BSM, and EBS. 



Search method 



LBM BSM EBS 



Instances 


U 


time (s) 


max 


avg 


time (s) 


max 


avg 


time (s) 


max 


avg 


uniform 


[1,100] 


0.17 


1 


1.00 


0.17 


1 


1.00 


0.14 


1 


1.00 




[1, 1000] 


3.02 


14 


2.88 


2.16 


6 


3.38 


1.98 


6 


2.38 




]1, 10000] 


33.42 


190 


1.71 


11.77 


10 


6.50 


11.67 


10 


6.23 


non-uniform 


[1,100] 


16.07 


8 


1.73 


21.56 


5 


3.56 


14.02 


4 


1.47 




[1, 1000] 


47.08 


26 


1.77 


54.14 


9 


6.06 


21.94 


6 


1.15 




]1, 10000] 


499.77 


254 


9.18 


491.21 


12 


8.98 


83.11 


10 


2.02 



4.3 Phases 

In this experiment, we investigate the effect of the preprocessing, construction, 
redistribution, and improvement phases. Four variants of the hybrid improve- 
ment procedure HI_PCmax are created: 

- Variant P: only lines 1-4 corresponding to the preprocessing phase of the 
pseudo-code in Figure 2 are executed. 

- Variant P+C: the core procedure C+R+I is implemented without the redistri- 
bution and improvement phases. 

- Variant P+C+R: the core procedure C+R+I is implemented without the im- 
provement phase. 

- Variant P+C+R+I: the core procedure C+R+I is fully implemented with all 
phases, corresponding to the complete HUCmax procedure itself. 

Table 2 shows the results obtained with each variant. The differences between 
corresponding columns associated with consecutive variants give a picture of the 
effect of each additional phase. For each family of test poblems and for each 
interval of processing times, we report the number of optimal solutions found 
and the total computation time in seconds over all 130 instances. 



Table 2. Phases: preprocessing, construction, redistribution, and improvement. 



Variants 



P C C+R C+R+I 



Instances pi 


opt. 


time(s) 


opt. 


time (s) 


opt. 


time (s) 


opt. 


time (s) 


uniform [1, 100] 


130 


0.06 


130 


0.06 


130 


0.07 


130 


0.14 


[1, 1000] 


122 


1.37 


123 


1.38 


125 


1.59 


126 


1.98 


[1, 10000] 


101 


2.50 


103 


2.77 


110 


4.70 


110 


11.67 




353 




356 




365 




366 




non-uniform [1, 100] 


71 


0.48 


71 


1.17 


85 


20.51 


120 


14.02 


[1, 1000] 


65 


0.62 


65 


1.90 


70 


40.76 


128 


21.94 


[1, 10000] 


68 


1.08 


68 


4.44 


70 


87.10 


121 


83.11 




204 




204 




225 




369 






8 A.C.F. Alvim and C.C. Ribeiro 



These results show that the uniform instances are relatively easy and the 
construction, redistribution, and improvement phases do not bring significative 
benefits in terms of solution quality or computation times. We notice that 90.5% 
of the 390 uniform instances are already solved to optimality after the prepro- 
cessing phase. The three additional phases allow solving only 13 other instances 
to optimality, at the cost of multiplying the total computation time by a factor 
of almost five. This picture is quite different for the non-uniform instances. In 
this case, only 204 out of the 390 test problems (52.3%) are solved to optimality 
after the preprocessing phase. The complete procedure with all phases made it 
possible to solve 165 additional instances to optimality. 

In consequence of the order in which the three heuristics are applied in the 
preprocessing phase (lines 1, 8, and 12 of the pseudo-code in Figure 2), of the 
557 optimal solutions found after this phase, 171 were obtained with the LPT 
heuristic H3, one with the 1/5-1- 2“^-approximation heuristic H2, and 385 with 
the construction heuristic HI proposed in Section 2. However, we note that if the 
lower bound max{L 2 , L^, L^, Lhs) is used, then heuristic HI alone is capable of 
finding 556 out of the 557 optimal solutions obtained during the preprocessing 
phase. This shows that HI is indeed a very useful fast heuristic to P\\Cmax- 



4.4 Comparison with Other Approaches 

In this final set of experiments, we compare the hybrid improvement heuristic 
HLPCmax with the list scheduling algorithm LPT [15], the 3-PHASE heuristic 
of Franga et al. [10], and the branch-and-bound algorithm B&B of Dell’Amico 
and Martello [7] with the number of backtracks set at 4000 as suggested by the 
authors, as well as with the best solution found by the multi-exchange (ME) 
algorithms. 



Table 3. Comparative results, uniform instances, L G [1,100]. 







LPT 






B&B 


HLPCmax 


3-PHASE 


m 


n 


error 


opt 


error 


opt time (s) 


error 


opt time (s) 


error 


5 


10 


3.54e-03 


9 


0 


10 


0 


10 


- 


0.018 


5 


50 


4.58e-03 


1 


0 


10 


0 


10 


- 


0.000 


5 


100 


8.81e-04 


4 


0 


10 


0 


10 


- 


0.000 


5 


500 


0 


10 


0 


10 


0 


10 


- 


0.000 


5 


1000 


0 


10 


0 


10 


0 


10 


- 


0.000 


10 


50 


1.56e-02 


0 


0 


10 


0 


10 


- 


0.002 


10 


100 


3.64e-03 


1 


0 


10 


0 


10 


- 


0.002 


10 


500 


1.20e-04 


7 


0 


10 


0 


10 


- 


0.000 


10 


1000 


0 


10 


0 


10 


0 


10 


- 


0.000 


25 


50 


8.61e-03 


6 


0 


10 


0 


10 


0.01 


0.011 


25 


100 


2.37e-02 


0 


0 


10 


0 


10 


- 


0.003 


25 


500 


9.04e-04 


4 


0 


10 


0 


10 


- 


0.000 


25 


1000 


0 


10 


0 


10 


0 


10 


- 


0.000 



is used to indicate negligible computation times. 
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Table 4. Comparative results, uniform instances, ti £ [1, 1000]. 



LPT B&B HLPCmax 3-PHASE 



m 


n 


error 


opt 


error 


opt 


time (s) 


error 


opt time (s) 


error 


5 


10 


0 


10 


0 


10 


- 


0 


10 


- 


0.010 


5 


50 


3.33e-03 


0 


0 


10 


- 


0 


10 


- 


0.001 


5 


100 


1.02e-03 


0 


0 


10 


- 


0 


10 


- 


0.000 


5 


500 


4.63e-05 


1 


0 


10 


- 


0 


10 


0.02 


0.000 


5 


1000 


7.97e-06 


5 


0 


10 


- 


0 


10 


0.05 


0.000 


10 


50 


1.61e-02 


0 


8.17e-05 


8 


0.01 


7.74e-05 


8 


0.02 


0.002 


10 


100 


4.04e-03 


0 


0 


10 


- 


0 


10 


- 


0.000 


10 


500 


2.21e-04 


0 


0 


10 


- 


0 


10 


0.01 


0.000 


10 


1000 


4.42e-05 


3 


0 


10 


- 


0 


10 


0.04 


0.000 


25 


50 


1.06e-02 


4 


0 


10 


- 


0 


10 


0.02 


0.011 


25 


100 


3.15e-02 


0 


2.07e-04 


8 


0.11 


9.93e-05 


8 


0.04 


0.003 


25 


500 


1.41e-03 


0 


0 


10 


- 


0 


10 


- 


0.000 


25 


1000 


2.75e-04 


0 


0 


10 


- 


0 


10 


0.01 


0.000 


is used to indicate negligible computation times. 












Table 5. Comparative results, uniform instances, L G [1, 10000]. 








LPT 




B&B 




HLPCmax 


3-PHASE 


m 


n 


error 


opt 


error 


opt 


time (s) 


error 


opt time (s) 


error 


5 


10 


6.48e-03 


9 


0 


10 


- 


0 


10 


0.03 


0.013 


5 


50 


5.92e-03 


0 


9.26e-06 


7 


0.01 


0 


10 


0.03 


0.000 


5 


100 


1.41e-03 


0 


0 


10 


- 


0 


10 


- 


0.000 


5 


500 


4.87e-05 


0 


0 


10 


- 


0 


10 


0.01 


0.000 


5 


1000 


1.02e-05 


0 


0 


10 


- 


0 


10 


0.14 


0.000 


10 


50 


2.45e-02 


0 


1.14e-03 


0 


0.18 


1.03e-04 


0 


0.25 


0.004 


10 


100 


4.82e-03 


0 


0 


10 


- 


0 


10 


- 


0.000 


10 


500 


2.34e-04 


0 


0 


10 


- 


0 


10 


0.01 


0.000 


10 


1000 


6.31e-05 


0 


0 


10 


- 


0 


10 


0.04 


0.000 


25 


50 


7.25e-03 


6 


0 


10 


- 


0 


10 


0.03 


0.004 


25 


100 


2.76e-02 


0 


4.49e-03 


0 


0.74 


3.47e-04 


0 


0.59 


0.001 


25 


500 


l.OOe-03 


0 


0 


10 


- 


0 


10 


0.01 


a 


25 


1000 


3.12e-04 


0 


0 


10 


- 


0 


10 


0.02 


0.000 



is used to indicate negligible computation times. 
“ not reported in [10]. 



Tables 3 to 5 report the results obtained by heuristics LPT, B&B, HLPCmax, 
and 3-PHASE for the uniform instances. Tables 6 to 8 give the results obtained 
by heuristics LPT, B&B, HLPCmax, and ME for the non-uniform instances. The 
following statistics over all ten test instances with the same combination of m and 
n are reported: (a) average relative errors with respect to the best lower bound 
for algorithms LPT, B&B, and HLPCmax; (b) average relative errors reported 
in [10] for algorithm 3-PHASE; (c) average relative errors reported in [12] for 
the best among the solutions obtained by ME algorithms 1-SPT, 1-BPT, and 
K-SPT; (d) number of optimal solutions found by LPT, B&B, and HLPCmax; 
(e) average computation times observed for LPT, B&B, and HLPCmax on a 
2.4 GHz AMD XP machine; and (f) average computation times reported in [12] 
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Table 6. Comparative results, non-uniform instances, ti G [1, 100]. 







LPT 




B&B 




HLPCmax 


ME 




m 


n 


error 


opt 


error 


opt 


time (s) 


error 


opt 


time (s) 


error 


time (s) 


W 


10 


0 


10 


0 


10 


- 


0 


10 


- 


0 


- 


5 


50 


9.37e-03 


0 


7.24e-03 


0 


0.11 


0 


10 


0.03 


8.58e-03 


0.01 


5 


100 


1.67e-02 


0 


0 


10 


- 


0 


10 


0.01 


5.31e-05 


0.02 


5 


500 


5.86e-04 


0 


0 


10 


0.01 


0 


10 


0.01 


0 


1.01 


5 


1000 


1.44e-04 


0 


0 


10 


- 


0 


10 


0.02 


0 


9.71 


10 


50 


1.13e-02 


0 


8.34e-03 


2 


0.23 


7.28e-03 


4 


0.30 


1.57e-02 


- 


10 


100 


9.02e-03 


0 


7.96e-03 


0 


0.21 


2.13e-04 


8 


0.22 


5.09e-03 


0.04 


10 


500 


l.OOe-02 


0 


0 


10 


0.02 


0 


10 


- 


2.13e-05 


3.26 


10 


1000 


3.83e-04 


1 


0 


10 


0.01 


0 


10 


- 


0 


17.14 


25 


50 


0 


10 


0 


10 


- 


0 


10 


- 


0 


0.01 


25 


100 


4.77e-03 


0 


2.65e-03 


3 


1.12 


1.34e-03 


8 


0.55 


9.85e-03 


0.04 


25 


500 


9.79e-03 


0 


1.60e-04 


8 


1.63 


0 


10 


0.08 


2.12e-04 


3.29 


25 


1000 


9.30e-03 


0 


1.17e-03 


4 


9.56 


0 


10 


0.18 


7.97e-05 


36.59 




is used to indicate negligible computation times. 
















Table 7. Comparative results, non- 


uniform instances, ti £ [1, 1000]. 








LPT 




B&B 




HLPCmax 


ME 




m 


n 


error 


opt 


error 


opt 


time (s) 


error 


opt 


time (s) 


error 


time (s) 


W 


10 


0 


10 


0 


10 


- 


0 


10 


- 


0 


- 


5 


50 


8.82e-03 


0 


7.94e-03 


0 


0.20 


0 


10 


0.03 


8.92e-03 


0.01 


5 


100 


1.70e-02 


0 


1.55e-04 


9 


0.02 


0 


10 


0.02 


7.43e-05 


0.06 


5 


500 


6.35e-04 


0 


0 


10 


- 


0 


10 


- 


0 


1.39 


5 


1000 


1.78e-04 


0 


0 


10 


- 


0 


10 


0.02 


0 


14.80 


10 


50 


4.08e-03 


0 


1.57e-03 


0 


0.42 


0 


10 


- 


1.46e-02 


0.01 


10 


100 


8.66e-03 


0 


8.18e-03 


0 


0.39 


0 


10 


0.35 


4.64e-03 


0.07 


10 


500 


l.Ole-02 


0 


0 


10 


0.01 


0 


10 


0.07 


0 


5.61 


10 


1000 


4.12e-04 


0 


0 


10 


0.11 


0 


10 


0.06 


0 


14.61 


25 


50 


0 


10 


0 


10 


- 


0 


10 


- 


0 


0.01 


25 


100 


4.67e-03 


0 


3.80e-03 


0 


1.06 


1.34e-03 


8 


1.08 


7.93e-03 


0.08 


25 


500 


l.OOe-02 


0 


1.39e-03 


1 


5.53 


0 


10 


0.13 


4.25e-05 


15.39 


25 


1000 


9.38e-03 


0 


7.97e-06 


9 


0.73 


0 


10 


0.43 


7.97e-06 


138.21 



is used to indicate negligible computation times. 



for the best ME algorithm on a 400 MHz Pentium II with 256 Mbytes of RAM 
memory. 

Most uniform instances are easy and can be solved in negligible computation 
times. Tables 3 to 5 show that HLPCmax found better solutions than LPT, B&B, 
and 3-PHASE for all classes of test problems. Only four instances in Table 4 
and 20 instances in Table 5 were not solved to optimality by HLPCmax. B&B 
outperformed LPT and 3-PHASE, but found slightly fewer optimal solutions and 
consequently slightly larger relative errors than HLPCmax. We also notice that 
the uniform test instances get harder when the range of the processing times 
increase. 

The non-uniform test instances are clearly more difficult that the uniform. 
Once again, HLPCmax outperformed the other algorithms considered in Ta- 
bles 6 to 8 in terms of solution quality and computation times. This conclusion 
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Table 8. Comparative results, non-uniform instances, ti £ [1, 10000]. 







LPT 




B&B 




HLPCmax 


ME 




m 


n 


error 


opt 


error 


opt 


time (s) 


error 


opt 


time (s) 


error 


time (s) 


5 


10 


0 


10 


0 


10 


- 


0 


10 


- 


0 


- 


5 


50 


8.79e-03 


0 


8.11e-03 


0 


0.28 


0 


10 


0.02 


8.95e-03 


0.01 


5 


100 


1.70e-02 


0 


l.OOe-04 


8 


0.05 


0 


10 


0.01 


5.78e-05 


0.09 


5 


500 


6.42e-04 


0 


0 


10 


- 


0 


10 


- 


0 


1.97 


5 


1000 


1.78e-04 


0 


0 


10 


- 


0 


10 


0.03 


0 


13.88 


10 


50 


4.12e-03 


0 


2.02e-03 


0 


0.54 


4.22e-06 


8 


0.35 


1.46e-02 


0.01 


10 


100 


8.61e-03 


0 


8.28e-03 


0 


0.52 


0 


10 


0.24 


4.63e-03 


0.15 


10 


500 


1.02e-02 


0 


0 


10 


0.02 


0 


10 


0.01 


1.06e-|-00 


7.99 


10 


1000 


4.10e-04 


0 


0 


10 


0.01 


0 


10 


0.02 


0 


15.57 


25 


50 


0 


10 


0 


10 


- 


0 


10 


- 


0 


0.01 


25 


100 


4.73e-03 


0 


4.16e-03 


0 


1.34 


1.35e-03 


3 


4.29 


7.76e-03 


0.14 


25 


500 


l.Ole-02 


0 


7.23e-04 


1 


12.70 


0 


10 


3.13 


1.91e-05 


20.37 


25 


1000 


9.40e-03 


0 


1.86e-06 


8 


5.55 


0 


10 


0.22 


5.31e-06 


195.88 



is used to indicate negligible computation times. 



is particularly true if one compares the results observed for the largest test in- 
stances with TO = 25 and n > 250. 

Table 9 summarizes the main results obtained by algorithms HLPCmax and 
B&B on the same computational environment. For each group of test problems 
and for each algorithm, it indicates the number of optimal solutions found over 
the 130 instances, the average and maximum absolute errors, the average and 
maximum relative errors, and the average and maximum computation times. 
The superiority of HI_PCmax is clear for the non-uniform instances. It not only 
found better solutions, but also in smaller computation times. 



Table 9. Comparative results: HLPCmax vs. B&B. 













HLPCmax 








opt 


absolute 


error 


relative 


error 


time 


(s) 


Instances 


ti c 




avg 


max 


avg 


max 


avg 


max 


uniform 


[1, 100] 


130 


0.00 


0 


0 


0 


0 


0.09 




]1, 1000] 


126 


0.03 


1 


1.36e-05 


5.15e-04 


0.02 


0.19 




Jl, 10000] 


110 


0.75 


12 


3.46e-05 


5.83e-04 


0.09 


0.77 


non-uniform 


Jl, 100] 


120 


0.32 


7 


6.79e-04 


1.50e-02 


0.11 


3.76 




]1,1000] 


128 


0.38 


25 


1.03e-04 


6.71e-03 


0.17 


9.49 




Jl, 10000] 


121 


3.90 


253 


1.04e-04 


6.75e-03 


0.64 


19.62 












B&B 












opt 


absolute 


error 


relative 


error 


time 


(s) 


Instances 


ti c 




avg 


max 


avg 


max 


avg 


max 


uniform 


Jl, 100] 


130 


0.00 


0 


0 


0 


0 


0.01 




]1,1000] 


126 


0.05 


3 


2.22e-05 


1.55e-03 


0.01 


0.75 




Jl, 10000] 


107 


9.31 


173 


4.33e-04 


8.30e-03 


0.07 


1.12 


non-uniform 


Jl, 100] 


87 


1.84 


20 


2.12e-03 


1.50e-02 


0.99 


35.41 




]1,1000] 


79 


15.59 


152 


1.77e-03 


9.42e-03 


0.65 


9.99 




Jl, 10000] 


77 


150.01 


880 


1.80e-03 


9.73e-03 


1.62 


23.60 




12 A.C.F. Alvim and C.C. Ribeiro 



5 Concluding Remarks 

We proposed a new strategy for solving the multiprocessor scheduling problem, 
based on the application of a hybrid improvement heuristic to the bin packing 
problem. We also presented a new, quick construction heuristic, combining the 
LPT rule with the solution of subset sum problems. 

The construction heuristic revealed itself as a very effective approximate 
algorithm and found optimal solutions for a large number of test problems. 
The improvement heuristic outperformed the other approximate algorithms in 
the literature, in terms of solution quality and computation times. The com- 
putational results are particularly good in the case of non-uniform test instances. 

Acknowledgments: The authors are grateful to M. Dell’Amico for having 
kindly provided the code of B&B algorithm used in the computational experi- 
ments. We are also thankful to P. Franga for making available the instances of 
the uniform family. 
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Abstract. The problem of finding a fundamental cycle basis with min- 
imum total cost in a graph is NP-hard. Since fundamental cycle bases 
correspond to spanning trees, we propose new heuristics (local search and 
metaheuristics) in which edge swaps are iteratively applied to a current 
spanning tree. Structural properties that make the heuristics efficient are 
established. We also present a mixed integer programming formulation 
of the problem whose linear relaxation yields tighter lower bounds than 
known formulations. Computational results obtained with our algorithms 
are compared with those from existing constructive heuristics on several 
types of graphs. 



1 Introduction 

Let G = (y,E) be a simple, undirected graph with n nodes and m edges, 
weighted by a non-negative cost function w : E ^ K+. A cycle is a subset 
C oi E such that every node of V is incident with an even number of edges in 
C. Since an elementary cycle is a connected cycle such that at most two edges 
are incident to any node, cycles can be viewed as the (possibly empty) union of 
edge-disjoint elementary cycles. If cycles are considered as edge-incidence binary 
vectors in {0, it is well-known that the cycles of a graph form a vector 

space over GF{2). A set of cycles is a cycle basis if it is a basis in this cycle 
vector space associated to G. The cost of a cycle is the sum of the costs of all 
edges contained in the cycle. The cost of a set of cycles is the sum of the costs 
of all cycles in the set. Given any spanning tree of G characterized by an edge 
set T C E, the edges in T are called branches of the tree, and those in E\T 
(the co-tree) are called the chords of G with respect to T. Any chord uniquely 
identifies a cycle consisting of the chord itself and the unique path in T con- 
necting the two nodes incident on the chord. These m — n + 1 cycles are called 
fundamental cycles and they form a Fundamental Cycle Basis (FCB) of G with 
respect to T. It turns out [I] that a cycle basis is fundamental if and only if each 
cycle in the basis contains at least one edge which is not contained in any other 
cycle in the basis. In this paper we consider the problem of finding Minimum 
Fundamental Cycle Bases (Min FCB) in graphs, that is FCBs with minimum 
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total cost. Since the cycle space of a graph is the direct sum of the cycle spaces 
of its biconnected components, we assume that G is biconnected, i.e., G contains 
at least two edge-disjoint paths between any pair of nodes. 

Cycle bases have been used in the field of electrical networks since the time of 
Kirchoff [2]. Fundamental cycle bases can be uniquely identified by their corre- 
sponding spanning trees, and can therefore be represented in a highly compact 
manner. Besides the above-mentioned characterization, Syslo established sev- 
eral structural results concerning FCBs [3,1,4]. For example, two spanning trees 
whose symmetric difference is a collection of 2-paths (paths where each node, 
excluding the endpoints, has degree 2) give rise to the same FCB [1]. Although 
the problem of finding a minimum cycle basis can be solved in polynomial time 
(see [5] and the recent improvement [6]), requiring fundament ality makes the 
problem NP-hard [7]. In fact, it does not admit a polynomial-time approxima- 
tion scheme (PTAS) unless P=NP; that is, under the same assumption there 
exists no polynomial-time algorithm that guarantees a solution within a factor 
of 1 -I- e for every instance and for any £ > 0 [8] . In the same work, a 4 -|- £ ap- 
proximation algorithm is presented for complete graphs, and a logn) 

approximation algorithm for arbitrary graphs. 

Interest in minimum FCBs arises in a variety of application fields, such as 
electrical circuit testing [9], periodic timetable planning [10] and generating min- 
imal perfect hash functions [11]. 

The paper is organized as follows. In Section 2 we describe a local search 
algorithm in which the spanning tree associated to the current FCB is iteratively 
modified by performing edge swaps, and we establish structural results that make 
its implementation efficient. In Section 3 the same type of edge swaps is adopted 
within two metaheuristic schemes, namely a variable neighbourhood search and 
a tabu search. To provide lower bounds on the cost of optimal solutions, a new 
mixed integer programming (MIP) formulation of the problem is presented in 
Section 4. Computational results are reported and discussed in Section 5. 

2 Edge-Swapping Local Search 

In our local search algorithm for the Min FCB problem, we start from the span- 
ning tree associated to an initial FCB. At each iteration we swap a branch of 
the current spanning tree with one of its chords until the cost cannot be further 
decreased, i.e., a local minimum is found. 

2.1 Initial Feasible Solutions 

Initial solutions are obtained by applying a very fast “tree-growing” proce- 
dure [12], where a spanning tree and its corresponding FCB are grown by adding 
nodes to the tree according to predefined criteria. The adaptation of Baton’s pro- 
cedure to the Min FCB problem proceeds as follows. The node set of the initial 
tree Vr only contains a root node iiq, and the set X of nodes to be examined is 
initialized at V. At each step a node u G A fl Vr (not yet examined) is selected 
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according to a predefined ordering. For all nodes 0 adjacent to rt, if z ^ Vt, the 
edge {z,u} is included in T (the edge is selected), the node z is added to Vt 
and the node u is removed from X. Nodes to be examined are selected according 
to non-increasing degree and, to break ties, to increasing edge star costs. The 
resulting order tends to maximize the chances of finding very short fundamental 
cycles early in the process. The performance of this tree-growing procedure is 
comparable to other existing tree-growing techniques [7,11]. 

2.2 Edge Swap 

Using edge swaps to search the solution space of the Min FCB problem is a 
good strategy, since all spanning trees of a graph can be obtained from any 
initial spanning tree by the repeated application of edge swaps [13]. Consider 
any given spanning tree T of G. For each branch e of T, the removal of e from T 
induces the partition of the node set V into two subsets and S^. Denote by 
the fundamental cut of G induced by the branch e of T, i.e., = S{S^) = 

{{m,u} G E \ u G S^,v G S^}. For any chord / G let tt = (e, /) be the 
edge swap which consists in removing the branch e from T while adding / to T. 
Denote by ttT the resulting spanning tree. 



Let T be the initial spanning tree constructed as in Section 2.1; 

loop 

2lopt 0; 

initialize TTopt to the identity; 
for all e G T 

for all / G Sf with f e 
TT := (e,/); 
if Att > Aopt then 
TTopt — tt; 

^opt = At^\ 

end if 
end for 
end for 

if TTopt is not the identity then 

T TToptU ; 

end if 

until TTopt is the identity 



Fig. 1. Local search algorithm for the Min FCB problem. 



For any spanning tree T, let C{T) be the set of cycles in the FCB associated 
to T, and let w{C{T)) denote the FCB total cost (the function w is extended 
to sets of edges in the obvious way: w{F) = ^f^pw{f) for any F C E). We 
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are interested in finding edge swaps tt = (e, /), where e is a branch of the 
current tree T and / is a chord in the fundamental cut 5^ induced by e, such 
that w{C{ttT)) < w{C{T)). For each branch e and chord /, the cost difference 
= w{C{T)) — w{C{ttT)) must be computed. Let Z\opt be the largest such 
and TTopt be the corresponding edge swap. If Ziopt < 0 we let TTopt be the identity 
permutation. 

The local search algorithm based on this edge-swapping operation is summa- 
rized in Fig. 1. 

2.3 Efficient Implementation 

Given the high worst-case computational complexity of each iteration of the 
basic local search procedure (see Section 2.4), an efficient implementation is of 
foremost importance. Since applying an edge swap to a spanning tree may change 
the fundamental cycles and cut structure considerably, efficient procedures are 
needed to determine the cuts 6^rp for all e € ttT, and to compute from the 
data at the previous iteration, namely from T, tt and the cuts 6^, for e € T. 



Edge swap effect on the cuts. In this subsection we prove that any edge 
swap 7 T = (e, /) applied to a spanning tree T, where e € T and / G S^, changes 
a cut if and only if / is also in <5^. Furthermore, is the symmetric 

difference S^AS^. This makes it easy to maintain data structures relative to the 
cuts that can be updated efficiently when tt is applied to T. 

For each pair of nodes u,v € V let (u, v) be the unique path in T from u to 
V. Let e = {uejVe} G T be an edge of the spanning tree and c = {uc,Vc} ^ T 
be a chord, where the respective endpoints Ue,Ve,Uc,Vc are nodes in V. Let 
pi(e,c) = (ue,Uc), P 2 {e,c) = (ue,Vc), P 3 (e,c) = (ve,Uc), Pi{e,c) = (ve,Vc) and 
PT{e,c) = {pi{e,c) C T \ i = 1, . . . ,4}. Note that exactly two paths in Fr(e, c) 
do not contain e. Let .Pr(e, c) denote the subset of Pt(c, c) composed of those two 
paths not containing e. Let P^{e,c) be whichever of the sets {pi(e, c),p 4 (e, c)}, 
{p 2 (e, c),p 3 (e, c)} has shortest total path length in T (see Fig. 2). In the sequel, 
with a slight abuse of notation, we shall sometimes say that an edge belongs to a 
set of nodes, meaning that its endpoints belong to that set of nodes. For a path 
p and a node set N C V{G) we say p C if the edges of p are in the edge set 
E{Gn) (i-e., the edges of the subgraph of G induced by N). Furthermore, we 
shall say that a path connects two edges e, / if it connects an endpoint of e to 
an endpoint of /. 

Lemma 2.1. For any branch e € T and chord c G E\T, we have c € if and 
only if pT{e,c) = Pf{e,c). 

Proof. First assume that c G 5^. Denoting by Uc,Vc the endpoints of c and 
by Ue,Ve those of e, we can assume w.l.o.g. that Uc,Vc,Ue,Ve are labeled so 
that Uc,Ue G and Vc,Ve G S^. Since there is a unique shortest path <7 in T 
connecting Uc to Vc with Uc G Vc G S^, then e G p. Thus, there are unique 
shortest sub-paths qi,q 2 of q such that qi = (ue,Uc), <72 = {ve,Vc) and q\ C 




18 



E. Amaldi et al. 




Fig. 2. (A) If c is in the fundamental cut induced by e, Pr(e,c) = Py(e, c) = {pi,pi\. 
Otherwise, up to symmetries, we have the situation depicted in (B) where pT{e,c) ^ 



St^ 92 C S^. Hence P^{e,c) = {91,92} = Pt(,^,c). Conversely, let P^{e,c) = 
{91 >92}) and assume that e ^ 91,92- Since either qi C and q2 C or vice 
versa, the endpoints of c are separated by the cut 6 ^, i.e., c £ 6 ^. □ 

Let T be a spanning tree for G = {V, E) and tt = (e, /) an edge swap with 
e £ T, f £ 6 ^ and / yf e. First we note that the cut in G induced by e of T is 
the same as the cut induced by / of ttT. 

Proposition 2.2. = Slj,. 

Proof. Since f £ S^, swapping e with / does not modify the partitions that 
induce the cuts, i.e., = Slj.. □ 

Second, we show that the cuts that do not contain / are not affected by tt. 

Proposition 2.3. For each h £ T such that h ^ e, and / ^ S!f, we have 
Tr{Slf) = 5 lf. 

Proof. Let g £ 6 lf. By Lemma 2 . 1 , the shortest paths pj ,P2 from the endpoints 
of h to the endpoints of g do not contain h. We shall consider three possibilities. 

( 1 ) In the case where e and / do not belong either to pf or p'^ we obtain 
trivially that pTrT{e,c) = PT{e,c) = Pf{e,c) = P*rp{e,c) and hence the result. 

( 2 ) Assume now that e £ pf, and that both e, / are in S^. The permutation tt 

changes pf so that / G pf^, whilst = p^. Now pf'^ is shortest because it is 
the unique path in ttT connecting the endpoints of pf , and since h ^ Pi"’" ,P 2^ 
because tt does not affect h, we obtain pT^T{e,c) = Pfj.{e,c). ( 3 ) Suppose that 
e £ Pi , e £ S!f and f £ S!f. Since f £ S^, hy Lemma 2.1 there are shortest 
paths qf , q2 connecting the endpoints of e and / such that qf S^, q^ G S^. 
Since e G S!f, f £ S!f and T is tree, there is an i in ( 1 , 2 } such that h £ qf (say, 
w.l.o.g. that i = 1 ); let qf = rf U{h}Urf, where rf C S!f connects h and e, and 
rf C S!f connects h and /. Let q"’" = rf U (ej U qf , then is the unique path 
in Sf connecting h and /. Since rf connects h and / in S!f, we must conclude 
that f £ Sif, which is a contradiction. □ 
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Third, we prove that any cut containing / is mapped by the edge swap 
7 T = (e, /) to its symmetric difference with the cut induced by e of T . 

Theorem 2.4. For each h G T such that h ^ e and f G S!j^, we have 7 t((5 ^) = 
Due to its length, the proof is given in the Appendix. 



Edge swap effect on the cycles. In order to compute efficiently, we have 
to determine how an edge swap tt = (e, /) affects the FCB corresponding to the 
tree T. For each chord hoiG with respect to T, let 7^ be the unique fundamental 
cycle induced by h. 

Fact If h ^ S^, then 7 ^ is unchanged by tt. 

The next result characterizes the way tt acts on the cycles that are changed 
by the edge swap tt. 

Theorem 2.5. If h G S^, then where 7 ^ is the funda- 

mental cycle in T corresponding to the chord f . 

Proof. We need the two following claims. 

Claim 1. For all h £ such that h ^ e, jlf (1 = {e, h}. 

Proof. Since 7^ is the simple cycle consisting of h and the unique path in T 
connecting the endpoints of h through e, the only edges both in the cycle and 
in the cut of e are e and h. 

Claim 2. For all pairs of chords g,h £ such that g ^ h there exists a unique 
simple cycle 7 C G such that g G 7, ft. G 7, and y\{g, ft} C T. 

Proof. Let g = (gi, 52)1 ft = (fti, ft2) and assume w.l.o.g. gi, ft-i, g2, ft-2 are labeled 
so that gi,fti G and g2,ft2 G S^. Since there exist unique paths p Q T 
connecting gi , fti and q QT connecting g2 , ft-2 , the edge subset 7 = {g, ft}UpUg 
is a cycle with the required properties. Assume now that there is another cycle 7' 
with the required properties. Then 7' defines paths p' , q' connecting respectively 
gi, hi and g2, ft2 in T. Since T is a spanning tree, p = p' and q = g'; thus 7' = 7. 

Consider the cycle 7 = y^Ay^,. By definition, e G 7^, e G y|., ft G 7^, / G y|.. 
Since ft, / G by Claim 1 ft ^ 7^ and / ^ 7^. Thus ft G 7, / G 7, e ^ 7. 
Consider now Tr{j!f). Since e G 7^ and tt = (e, /), / G Tr{'j!f). Furthermore, since 
TT fixes h, h £ Tr{'jlf). Hence, by Claim 2, we have that Tr(7y) = 7 = y^Ay^. □ 

2.4 Computational Complexity 

We first evaluate the complexity of applying an edge swap to a given spanning 
tree and of computing the fundamental cut and cycle structures in a basic im- 
plementation. Computing the cost of a FCB given the associated spanning tree 
T is 0{mn), since there are m — n -I- 1 chords of G relative to T and each one 
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of the corresponding fundamental cycles contains at most n edges. To select the 
best edge swap available at any given iteration, one has to evaluate the FCB 
cost for all the swaps involving one of the n — 1 branches e G T and one of the 
(at most m — n + 1) chords f G 6^. Since computing a fundamental cut requires 
0{m), the total complexity for a single edge swap is 0{m^n'^). 

In the efficient implementation described in Section 2.3, fundamental cuts and 
cycles are computed by using symmetric differences of edge sets, which require 
linear time in the size of the sets. Since there are m fundamental cycles of size at 
most n, and n fundamental cuts of size at most m, updating the fundamental cut 
and cycle structures after the application of an edge swap (e, /) requires 0{mn). 
Doing this for each branch of the tree and for each chord in the fundamental cut 
induced by the branch, leads to an total complexity. 

It is worth pointing out that computational experiments show larger speed- 
ups in the average running times (with respect to the basic implementation) 
than those suggested by the worst-case analysis. 

2.5 Edge Sampling 

The efficient implementation of the local search algorithm described in Fig. 1 is 
still computationally intensive, since at each iteration all pairs of tree branches 
e and chords f G must be considered to select the best available edge swap. 
Ideally, we would like to test the edge swap only for a small subset of pairs e, / 
while minimizing the chances of missing pairs which yield large cost decreases. 



o 9 9 9 9 

; 4 I 6 I si 10 1 

o 6 6 6 6 



P P P 

1 6 1 4 y 

6 6 o 



— p 

4 1 el 
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Fig. 3. All edge weights are equal to 1 and the numbers indicated on the chords corre- 
spond to the costs of the corresponding fundamental cycles. The cut on the left has a 
difference between the cheapest and the most expensive cycles of 10 — 4 = 6; after the 
edge swap the difference amounts to 6 — 4 = 2. 



A good strategy is to focus on branches inducing fundamental cuts whose 
edges define fundamental cycles with “unbalanced” costs, i.e., with a large differ- 
ence between the cheapest and the most expensive of those fundamental cycles. 
See Fig. 3 for a simple example. This is formalized in terms of an order <h on 
the tree branches. For branches ei, C 2 G T, we have e\ <b C 2 if the difference be- 
tween the maximum and minimum fundamental cycle costs deriving from edges 
in 6!^ is smaller than that deriving from edges in . Computational experience 
suggests that branches that appear to be larger according to the above order 
tend to be involved in edge swaps leading to largest decreases in the FCB cost. 

This strategy can be easily adapted to sampling by ordering the branches of 
the current spanning tree as above and by testing the candidate edge swaps only 
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for the first a fraction of the branches, where 0 < ct < 1 is an arbitrary sampling 
constant. 

3 Metaheuristics 

To go beyond the scope of local search and try to escape from local minima, 
we have implemented and tested two well-known metaheuristics: variable neigh- 
bourhood search (VNS) [14] and tabu search (TS) [15]. 



3.1 Variable Neighbourhood Search 

In VNS one attempts to escape from a local minimum x' by choosing another 
random starting point in increasingly larger neighbourhoods of x' . If the cost of 
the local minimum x" obtained by applying the local search from x' is smaller 
than the cost of a?', then x" becomes the new best local minimum and the 
neighbourhood size is reset to its minimal value. This procedure is repeated 
until a given termination condition is met. 

For the Min FCB problem, given a locally optimal spanning tree T' (obtained 
by applying the local search of Fig. 1), we consider a neighbourhood of size p 
consisting of all those spanning trees T that can be reached from T' by applying 
p consecutive edge swaps. A random solution in a neighbourhood of size p is 
then obtained by generating a sequence of p random edge swaps and applying 
it to T' . 



3.2 Tabu Search 

Our implementation of tabu search includes diversification steps a la VNS (vTS). 
In order to escape from local minima, an edge swap that worsens the FCB cost 
is applied to the current solution and inserted in a tabu list. If all possible edge 
swaps are tabu or a pre-determined number of successive non-improving moves 
is exceeded, t random edge swaps are applied to the current spanning tree. The 
number t increases until a pre-determined limit is reached, and is then re-set to 
1. The procedure runs until a given termination condition is met. 

Other TS variants were tested. In particular, we implemented a “pure” TS 
(pTS) with no diversification, and a fine-grained TS (fTS) where, instead of 
forbidding moves (edge swaps), feasible solutions are forbidden by exploiting 
the fact that spanning trees can be stored in a very compact form. We also 
implemented a TS variant with the above-mentioned diversification steps where 
pTS tabu moves and fTS tabu solutions are alternatively considered. Although 
the results are comparable on most test instances, vTS performs best on average. 
Computational experiments indicate that diversification is more important than 
intensification when searching the Min FCB solution space with our type of edge 
swaps. 
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4 Lower Bounds 

A standard way to derive a lower bound on the cost of the optimal solutions of 
a combinatorial optimization problem (and thus to estimate heuristics perfor- 
mance) is to solve a linear relaxation of a (mixed) integer programming formu- 
lation. Three different integer programming formulations were discussed in [16]. 

We now describe an improved formulation that uses non-simultaneous flows 
on arcs to ensure that the cycle basis is fundamental. Consider a biconnected 
graph G = {V, E) with a non-negative cost Wij assigned to each edge {i,j} G E. 
For each node v G V, 6{v) denotes the node star of v, i.e., the set of all edges 
incident to v. Let Go = {V,A) be the directed graph associated with G, namely 
A = (j, G E}. We use two sets of decision variables. For each edge 

{k,l} G E, the variable > 0 represents the flow through arc {i,j) G A from 
k to 1. Moreover, for each edge {i,j} G E, the variable Zij is equal to 1 if edge 
{i,j} is in the spanning tree of G, and equal to 0 otherwise. For each pair of 
arcs (i,j) G A and (j,i) G A, we define wji = Wij. 

The following MIP formulation of the Min FCB problem provides much 
tighter bounds than those considered in [16]: 



™ “ 2zy)wij (1) 

{fc.qes (i.i)GA {i.ilGB 

= ^ nk,l}&E ( 2 ) 

jeS{k) 

^ ( 4 ' - 4 ) = 0 y{k, 1} GE,\fiG V\{k, 1} (3) 

j&S{i) 

y{k,l}GE,y{t,j}GE (4) 

\/{k,l}GE,\/{i,j}GE (5) 

X Zij =n-l (6) 

4>0 y{k,l}GE,'i{i,j)GA 
Zij G {0, 1} V{i, j} G E. 



For each edge {k, Z} G A, a path p from fc to Z is represented by a unit 
of flow through each arc (i, j) in p. In other words, a unit of flow exists node 
k and enters node Z after going through all other (possible) nodes in p. For 
each edge {Zc, 1} G E, the flow balance constraints (2) and (3) account for a 
directed path connecting nodes k and Z. Note that the flow balance constraint 
for node Z is implied by constraints (2) and (3). Since constraints (4) and (5) 
require that Zij = 1 for every edge {i,j} contained in some path (namely with a 
strictly positive flow), the z variables define a connected subgraph of G. Finally, 
constraint (6) ensures that the connected subgraph defined by the z variables is 
a spanning tree. The objective function (1) adds the cost of the path associated 
to every edge {k,l} G E and the cost of all tree chords, and subtracts from it 
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the cost of the tree branches (which are counted when considering the path for 
every edge {k, ^}). 

Besides the quality of the linear relaxation bounds, the main shortcoming of 
this formulation is that it contains a large number of variables and constraints 
and hence its solution becomes cumbersome for the largest instances. 

5 Some Computational Results 

Our edge-swapping local search algorithm and metaheuristics have been imple- 
mented in C-I--I- and tested on three types of unweighted and weighted graphs. 
CPU times refer to a Pentium 4 2.66 GHz processor with 1 GB RAM running 
Linux. 



5.1 Unweighted Mesh Graphs 

One of the most challenging testbeds for the Min FCB problem is given by 
the square n x n mesh graphs with unit costs on the edges. This is due to the 
large number of symmetries in these graphs, which bring about many different 
spanning trees with identical associated FCB costs. Uniform cost square mesh 
graphs have nodes and 2n(n — 1) edges. Table 1 reports the FCB costs and 
corresponding CPU times of the solutions found with: the local search algorithm 
(LS) of Fig. 1, the variant with edge sampling (Section 2.5), the NT-heuristic 
cited in [11], the VNS and tabu search versions described in Section 3. For 
LS with edge sampling, computational experiments indicate that a sampling 
constant of 0.1 leads to a good trade-off between solution quality and CPU time 
for this type of graphs. The lower bounds in the last column correspond to the 
cost of a non-fundamental minimal cycle basis, that is to four times the number 
of cycles in a basis: 4(m — n -I- 1). For this particular type of graphs, the linear 
relaxation of the MIP formulation provides exactly the same lower bounds. 

5.2 Random Euclidean Graphs 

To asses the performance of our edge-swapping heuristics on weighted graphs, 
we have generated simple random biconnected graphs. The nodes are positioned 
uniformly at random on a 20 x 20 square centered at the origin. Between each 
pair of nodes an edge is generated with probability p, with 0 < p < 1. The cost of 
an edge is equal to the Euclidean distance between its adjacent nodes. For each 
n in {10, 20, 30, 40, 50} and p in (0.2, 0.4, 0.6, 0.8}, we have generated a random 
graph of size n with edge probability p. 

Table 2 reports the results obtained with the edge-swapping heuristics (pure 
local search and metaheuristics) on these random graphs. The first two columns 
indicate the performance of LS in terms of FCB cost and CPU time. The next 
two columns correspond to the lower bounds obtained by partially solving the 
MIP formulation of Section 4. The third and fourth two-column groups indicate 
the performances of VNS and TS. There was enough available data to ascribe 
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Table 1. Computational results (FCB cost and CPU times (h:mm:ss)) for nx n mesh 
graphs having unit edge costs, obtained with different heuristics. The VNS and TS 
metaheuristics were run for 10 minutes (after finding the first local optimum). Values 
marked with * denote an improved value with respect to LS. Lower bounds on the 
optimal value are reported in the last column. 





LS 


LS with edge sampling 


NT [11] 


VNS 


TS 


Bound 


n 


Cost Time 


Cost 


Time 


Cost Time 


Cost 


Cost 


Cost 


5 


72 0:00:00 


74 


0:00:00 


78 0:00:00 


72 


72 


64 


10 


474 0:00:00 


524 


0:00:00 


518 0:00:00 


466* 


466* 


324 


15 


1318 0:00:00 


1430 


0:00:00 


1588 0:00:00 


1280* 


1276* 


784 


20 


2608 0:00:03 


3186 


0:00:00 


3636 0:00:00 


2572* 


2590* 


1444 


25 


4592 0:00:16 


5152 


0:00:02 


6452 0:00:00 


4464* 


4430* 


2304 


30 


6956 0:00:47 


8488 


0:00:03 


11638 0:00:00 


6900* 


6882* 


3364 


35 


10012 0:02:19 


11662 


0:00:08 


16776 0:00:00 


9982* 


9964* 


4624 


40 


13548 0:06:34 


15924 


0:00:26 


28100 0:00:01 


13524* 


13534* 


6084 


45 


18100 0:14:22 


22602 


0:01:00 


35744 0:00:01 


18100 


18100 


7744 


50 


23026 0:31:04 


33274 


0:01:10 


48254 0:00:03 


23026 


23552 


9604 



some statistical significance to the average percentage gap between heuristic and 
lower bounding values (8.19%), and its reassuringly low standard deviation (± 
5.15%). The maximum frequency value is also a rather low value (6%). It is worth 
pointing out that the lower bounds obtained by solving the linear relaxation of 
the formulation presented in Section 4 are generally much tighter than those 
derived from the formulations considered in [16]. 

5.3 Weighted Graphs from Periodic Timetabling 

An interesting application of Min FCB arises in periodic timetabling for trans- 
portation systems. To design the timetables of the Berlin underground, Liebchen 
and Mohring [10] consider the mathematical programming model based on the 
Periodic Event Scheduling Problem (PESP) [17] and the associated graph G in 
which nodes correspond to events. Since the number of integer variables in the 
model can be minimized by identifying an FCB of G and the number of discrete 
values that each integer variable can take is proportional to the total FCB cost, 
good models for the PESP problem can be obtained by looking for minimum 
fundamental cycle bases of the corresponding graph G. 

Due to the way the edge costs are determined, the Min FCB instances aris- 
ing from this application have a high degree of symmetry. Such instances are 
difficult because, at any given heuristic iteration, a very large number of edge 
swaps may lead to FCBs with the same cost. Notice that this is generally not 
the case for weighted graphs with uniformly distributed edge costs. The re- 
sults reported in Table 3 for instance timtab2, which is available from MIPLIB 
(http://miplib.zib.de) and contains 88 nodes and 316 edges, are promising. 
According to practical modeling requirements, certain edges are mandatory and 
must belong to the spanning tree associated to the Min FCB solution. The 
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Table 2. Computational results (FCB costs, CPU times (mm:ss), lower bounds) for 
Euclidean random graphs. Lower bound values marked with ^ denote an optimal solu- 
tion (MIP solved to optimality). FCB costs are marked with ^ when the metaheuristic 
improved on the value found by LS. Missing values are due to excessive CPU timings. 











P = 


0.2 








n 


LS 


CPU time 


Bound 


CPU time 


VNS 


CPU time 


TS 


CPU time 


10 


216.698t 


0 


216.698+ 


0 


216.698+ 


0 


216.698+ 


0 


20 


1052.38+ 


0 


1052.38+ 


0:56 


1052.38+ 


0 


1052.38+ 


0 


30 


3315.89 


0 


2750.92 


0:28 


3111.71* 


0:14 


3315.89 


0 


40 


4634.04 


0 


4065.187 


16:58 


4504.84* 


0:22 


4633.45* 


0 


50 


7007.34 


0:01 


6448.711 


2:38:51 


6991.53* 


1:11 


7007.34 


0:02 










P = 


0.4 








n 


LS 


CPU time 


Bound 


CPU time 


VNS 


CPU time 


TS 


CPU time 


10 


472.599 


0 


459.305+ 


0:02 


459.305+ 


0 


472.599 


0 


20 


2021.82 


0 


1894.747 


0:08 


2021.37* 


0:04 


2021.37* 


0 


30 


4467.13 


0 


4265.6 


22:56 


4455.2* 


0:29 


4455.2* 


0:01 


40 


7685.97 


0:01 


- 


- 


7648* 


1:46 


7684.53* 


0:02 


50 


11096.8 


0:05 


- 


- 


11022.8* 


9:32 


11073.4* 


0:12 










P = 


0.6 








n 


LS 


CPU time 


Bound 


CPU time 


VNS 


CPU time 


TS 


CPU time 


10 


581.525 


0 


547.406+ 


0:08 


547.406+ 


0 


547.406+ 


0 


20 


2776.22 


0 


2627.558 


0:59 


2756.6* 


0:08 


2756.6* 


0 


30 


7031.2 


0 


6445.83 


39:32 


6979.15* 


1:13 


7031.2 


0:03 


40 


11686.0 


0:02 


- 


- 


11513* 


6:40 


11683.4* 


0:04 


50 


19387.3 


0:10 


- 


- 


19174.1* 


7:06 


19174.1* 


1:06 










P = 


0.8 








n 


LS 


CPU time 


Bound 


CPU time 


VNS 


CPU time 


TS 


CPU time 


10 


992.866 


0 


775.838+ 


0:26 


775.838+ 


0 


775.838+ 


0 


20 


3478.11 


0 


3164.9 


2:31 


3383.45* 


0:13 


3383.45* 


0:02 


30 


8971.78 


0:01 


7823.848 


1:43:05 


8384.32* 


2:42 


8930.17* 


0:02 


40 


14946.4 


0:07 


- 


- 


14870.7* 


5:30 


14902.2* 


0:16 


50 


25349.9 


0:12 


- 


- 


25061.2* 


31:55 


25245.5* 


0:53 



above-mentioned instance contains 80 mandatory edges out of 87 tree branches, 
and most of the these 80 fixed edges have very high costs. As shown in Table 3 
(instance liebchen-f ixed), this additional condition obviously leads to FCBs 
with substantially larger costs. 

6 Concluding Remarks 

We described and investigated new heuristics, based on edge swaps, for tackling 
the Min FCB problem. Compared to existing tree-growing procedures, our lo- 
cal search algorithm and simple implementation of the VNS and Tabu search 
metaheuristics look very promising, even though computationally more inten- 
sive. We established structural results that allow an efficient implementation of 
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Table 3. Computational results for Liebchen’s instance. Missing values are due to a 
missing implementation of the corresponding algorithm which deals with mandatory 
spanning tree edges. Values marked with * correspond to an improvement with respect 
to the LS solution. 





Local search 


NT [11] 


VNS 


TS 


Lower bound 


Instance 


FOB cost Time 


FCB cost 


FCB cost Time 


FCB cost Time 


FCB cost 


liebchen 
liebchen-f ixed 


40520 0.7s 

46072 0.13s 


50265 


39801* 30s 


39841* 30s 

46002* 30s 


31220.534 

39907.96 



the proposed edge swaps. We also presented a new MIP formulation whose lin- 
ear relaxation provides tighter lower bounds than known formulations on several 
classes of graphs. 
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Appendix: Proof of Theorem 2.4 

To establish that, for each h G T such that h ^ e and / € 5l^, we have 7 t(( 5^) = 
we proceed in four steps. 

We prove that: (1) g G C\ ^ g ^ 7t(5^), (2) g ^ Stu 6^ ^ g ^ 

(3) g G ^gG Tr{S!^), and (4) g G S^\S!^ ^gG 7t((5^). 

When there is no ambiguity, is written 5®. 

Claim 1: g G 6^ 5^ ^ g ^ 

Proof. Since g G 6^ there are shortest paths pJ,P 2 connecting g,h and not 





Fig. 4. Claim 1: If g G <5^ O J® then g 0 7t(J^). 



containing h, such that pf C S!f and C S!f. Since g G 5®, either pf or 
P 2 must contain e, but not both. Assume w.l.o.g. that e G pi, e ^ p 2 (i.e., 
e G S!f). Thus 7T sends pj to a path pf'^ containing /, whereas = pj. Thus 
Pfrp(h,g) = {Pi^,P 2 ^}. Since / G 6!f, there exist shortest paths qf C Slf and 
02 ^ >^T connecting f,h and not containing h. Because C S!f, e ^ In 
ttT, Q 2 can be extended to a path q'^'^ = q'^ ij {/}. By Proposition 2.2 g G S^j,. 
Thus in ttT there exist paths in S^j. = and in S^j. = connecting 
/, g and not containing /. Notice that the path q'^'^ U connects the endpoint 
of h in S!f and g, and pj^ connects the same endpoint of h with the opposite 
endpoint of g. Thus p^^ = {h} U U rf’^, which means that h G pf^ , i.e., 
Pfrp{h,g) ^ pTrT{h,g). By Lemma 2.1, the claim is proved. 
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Claim 2: g ^ 6^ U S'^ ^ g ^ 

Proof. By hypothesis, g G S^dSlf or g £ S^dSlf. Assume the former w.l.o.g. Let 




Fig. 5. Claim 2: If g 0 5^ U 5® then g ^ 



Pi ,P 2 be unique shortest paths from h to g, and assume h £ pf. Thus P 2 C 
Since g ^ 6^, there are unique shortest paths qf,q 2 from e to g such that e is 
in one of them, assume e £ qf . Since both h,e G T, there is a shortest path 
between one of the endpoints of h and one of the endpoints of e, while the 
opposite endpoints are linked by the path {/i} U U {e}. Suppose e £ pf . Then 
since both endpoints of g are reachable from e via qf ,q 2 , and e is reachable from 
h through r^, it means that e G pj ■ Conversely, if e ^ pj , then e ^ pj ■ Thus, we 
consider two cases. If e is not in the paths from h to g, then tt fixes those paths, 
i.e., h G and h ^ that is p ^ 5’^. If e is in the paths from h to g, then 
both the unique shortest paths p^^ and pj^ connecting h and g in ttT contain 
/. Since g ^ Slj, = S^, there are shortest paths connecting f to g one 

of which, say sf'^, contains /. Moreover, since both h, f £ T, there is a shortest 
path connecting one of the endpoints of h to one of the endpoints of /, the 
other shortest path between the opposite endpoints being {h}Uu'^'^U{f}. Thus, 
either p 5^"^ = andpj^ = {/i}U'u'^^U{/}Us5^, orpj^ = m’^^U{/}Us 2 ^ 

and P 2 ^ = {ft-} U U . Either way, one of the paths contains ft. By Lemma 
2.1, the claim is proved. 

Claim 3: g £ ^ g £ 

Proof Since g £ 6^, there are shortest paths p^ C Slf,p^ C S!f connecting ft 
and g, none of which contains ft. Assume w.l.o.g. e G Slf. Suppose e G pf , say 
Pi = U (ej U r'^ . Consider U {ft} U p^ and r^. These are a pair 

of shortest paths connecting e and g such that e does not belong to either; i.e., 
g G ftfn, which contradicts the hypothesis. Thus e ^ pf, i.e., tt fixes paths 7rf ,7r|’; 
thus P^T{h,g) = pT{h,g) = Pf{h,g) = -P^T(ft,p)j which proves the claim. 

Claim 4: p G S^\6^ ft G 

Proof First consider the case where e,p G S!f. Since p ^ S!f, the shortest paths 
Pi ,P 2 connecting ft, p are such that one of them contains ft, say ft G Pi, whilst 
P 2 C S!f. Since p G ft®, there are shortest paths qf , pj entirely in S!f, connecting 
e, p such that neither contains e. Since both e,h £ T there is a shortest path 
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Fig. 6. Claim 3: If g G and g 0 5®, then g G 




h h 



Fig. 7. Claim 4: If g ^ 5^ and g G 5", then y G 

C connecting an endpoint of h to an endpoint of e, the opposite endpoints 
being joined by {h}Ur^ U {e}. Thus w.l.o.g. pj = {/ijUr^UlejUgf . Since the 
path U Q 2 does not include e and connects h, g, e ^ P 2 ■ Thus and 

h ^ pj^. On the other hand tt sends pj to a unique shortest path pj^^ connecting 
h,g that includes /. Since / G (5^, there are shortest paths sf C C 5^ 

that do not include h. Since pj Q S!}^, sf may only touch the same endpoint of h 
as P 2 - Thus the endpoint of h touched by Pi also originates sj. Since S 2 C S!j^, 
h ^ 82 - Since p^'^ joins h,g, contains / and is shortest, p^'^ = sf U {/} U u, 
where is a shortest path from f to g (which exists because by Proposition 
2.2 g G which shows that h ^ Pi^. Thus by Lemma 2.1, g G 7t(( 5^). The 
second possible case is that e G S'^,p G Since g G there are shortest paths 
pf ,P 2 connecting e, g such that neither includes e. Assume w.l.o.g. h G S^. Since 
e,g are partitioned by <5^, exactly one of pf , p^ includes h (say h G pf , which 
implies pj C . Let qf be the sub-path of pf joining h and g and not including 
h, and let be the sub-path of pf joining h and e and not including h. Let 
^2 = U {e} Up 2 - We have that q 2 is a shortest path joining h, g not including 
h. Thus pT{h,g) = {gf,?!'} = by Lemma 2.1 g G <5^, which is a 

contradiction. □ 
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Abstract. Real world problems nsually have to deal with some un- 
certainties. This is particularly true for the planning of services whose 
requests are unknown a priori. 

Several approaches for solving stochastic problems are reported in the 
literature. Metaheuristics seem to be a powerful tool for computing good 
and robust solutions. However, the efficiency of algorithms based on Lo- 
cal Search, such as Tabu Search, suffers from the complexity of evaluating 
the objective function after each move. 

In this paper, we propose alternative methods of dealing with uncertain- 
ties which are suitable to be implemented within a Tabu Search frame- 
work. 



1 Introduction 



Consider the following 


deterministic linear 


program: 






LP : min 


j 








s.t. 


j 


i=l,.. 


. , m 


(la) 




Xj > 0, 


j = I,-- 


. ,n. 


(lb) 



The cost coefficients cj, the technological coefficients aij, and the right-hand 
side values bi are the problem parameters. In practical applications, any or all of 
these parameters may not be precisely defined. When some of these parameters 
are modelled as random variables, a stochastic problem arises. 

When Cj, aij or bi are random variables having a known joint probability 
distribution, z is also a random variable. When only the coefficients Cj are ran- 
dom, the problem can be formulated as the minimization of the expected value 
of z. Otherwise, a stochastic programming approach must be used. Two main 
variants of stochastic programming (see e.g. [4]) are the stochastic programming 
with recourse and the chance- constrained programming. 

Charnes and Cooper [5] proposed to replace constraints (la) with a number 
of probabilistic constraints. Denoting with P(.), the probability of an event, we 
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consider the probability that constraint i is satisfied, i.e. : 





Let ai be the maximum allowable probability that constraint i is violated, a 
chance-constrained programming formulation of LP, say CCP, can be 
obtained by replacing (la) with the following chance constraints: 





> 1 — Oj , z = 1, . . . , TO. 



(2) 



When all Cj’s are known, this formulation minimizes z while forbidding the 
constraints to exceed certain threshold values 

Moving constraints (2) to the objective function via Lagrangean multipliers, 
we obtain the following stochastic program: 



SPR : min E ^0^3 + ^ A* P E Oij Xj > b 



s.t. Xj > 0, 



Using the Lagrangean multipliers, SPR directly considers the cost of recourse, 
i.e. the cost of bringing a violated constraint to feasibility. 

When the deterministic mathematical model involves also binary and/or in- 
teger variables, as in many real applications, the complexity of the associated 
stochastic program increases. To this purpose, several solution approaches are 
reported in literature such as the Integer L-Shaped Method [12], heuristics (see 
e.g. those for Stochastic Vehicle Routing Problem [7,8]), methods based on a 
priori optimization [3] or sample-average approximation [13]. 

In this paper we propose a new algorithmic approach which combines Tabu 
Search and simulation for Chance-Constrained Programming. Glover and Kelly 
have described the benefits of applying Simulation to the solution of l^LP-hard 
problems [9]. Tabu Search [10] is a well-known metaheuristic algorithm which 
has proved effective in a great number of applications. 

The paper is organized as follows. The motivations and the basic idea of 
combining Tabu Search and Simulation are presented and discussed in section 2. 
In section 3 we introduce two optimization problems which are used to evaluate 
the efficiency of the proposed algorithms. Section 4 describes the two problems. 
Section 5 reports about the planning and the results of the computational ex- 
periments. Finally, ongoing work is discussed in section 6. 



2 Motivations and Basic Ideas 

In the following, we refer to the general model IP derived from LP setting the 
Xj's to be integer. 
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Tabu Search (TS) explores the solution space by moving at each iteration to 
the best neighbor of the current solution, even if the objective function is not 
improved. In order to avoid cycling, each move is stored in a list of tabu moves 
for a number of iterations: a tabu move is avoided until it is deleted from the 
list. A basic scheme of TS algorithm, say bts, is depicted in Algorithm 1. 



Algorithm 1 bts 

k ~ 1; 

X := InitialSol 0 ; 
while (not stop) do 

N{x) := NeighborhoodOf (x) ; 
aW(x) := N{x) \ r('=)(x) U 
x' := BestOf 
X := x'\ 
k ■.= k + 1\ 
end while 



Note that: is the set of tabu solutions generated by x using tabu 

moves at iteration k; A^^^x) is the set of tabu solution which are evaluated 
since they respect some aspiration criteria (e.g. their objective function value 
improves that of current best solution) . The algorithm usually stops after a given 
number of iterations or after a number of not improving iterations. 

From a computational point of view, the computation of N{x) and its eval- 
uation (the choice of the best move) are the most time consuming components. 
For instance, a linear running time to evaluate the objective function of a single 
move is usually considered acceptable for a deterministic problem. 

Unfortunately, this is not always true for stochastic programs. If we consider 
both SPR and CCP models, we observe that a move evaluation requires to 
compute a quite complex probability function. For instance, in [8], the authors 
proposed a SPR formulation for Vehicle Routing Problem with Stochastic De- 
mands and Customers: the proposed TS algorithm, tabustoch, requires at least 
O(n^) to evaluate a single move, where n is the number of demand locations. 
More generally, the evaluation of a new move involves probability and, at least, 
two stages of computation [4] . 

Our main concern is to reduce the computational complexity required for 
neighborhood exploration by introducing simulation methods within TS frame- 
work for solving a CCP programs. 

The idea is based on a different way of dealing with random parameters: 
instead of computing directly the probability function, which is computationally 
expensive, we use simulation to evaluate random parameters. Then, we use these 
simulated random parameters, within the TS framework, in order to avoid moves 
which lead to unfeasible solutions, i.e. moves which make unfeasible the chance- 
constraints (2). 
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For the sake of simplicity, we assume that only the biS are random. Clearly, 
the following remarks can be extended straightforwardly to the other problem 
parameters. In order to simulate random parameters, we introduce the following 
notation: 



= the variable x at fc-th iteration, 

5^ ^ = the t-th simulated value of 6^ (t = 1, . . . , T), 



l(4) 



=r-E 



(fe) 

aijXj 






(k) 

Let SI be given as follows: 

T 



Ak) _ 



= where = 



_ I 1 if > 0 



0 otherwise 



(4) 



(k) 

The value of S} ' counts the number of successes for the z-th constraint (i.e. 

_ gik) 

the constraint is satisfied) The value estimates the probability of 

constraint i to be satisfied at iteration k 

Taking into account CCP models, we are interested in computing solutions 
such that the chance-constraints (2) are satisfied for a given probability. 

In this case, we can introduce a concept similar to that of a tabu move. The 
idea is to avoid all the moves leading to solutions which make unfeasible the 
respective chance-constraint. More formally, a move is probably tabu at iteration 
k if 



< 1 — , z = 1, . . . , m. (5) 

Let P^'^^x) be the set of probably tabu solutions generated by x at iteration 
k. Then, the corresponding TS algorithm, say simts-ccp, can be obtained from 
Algorithm 1 by modifying the computation of (x) as 

(x) := N{x) \ T^^'> (x) \ (x) U (x). (6) 

The SIMTS-CCP procedure is sketched in Algorithm 2. 

Finally, TS offers to the researchers a great flexibility. For instance, a common 
practice is to use a simple neighborhood structure and a penalized objective 
function to take into account the unfeasibility when some constraints are violated 
(see e.g. [6]). A general form of penalized function can be the following: 

z + '^f3ip^{x) ( 7 ) 

I 

where /3j > 0 is usually a self-adjusting penalty coefficient and Pi{x) > 0 is a 
measure of how much the z-th constraint is unfeasible. 
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Algorithm 2 simts-ccp 

k := 1; 

X ■.= InitialSol () ; 
while (not stop) do 

N{x) NeighborhoodOf (x) ; 

iVW(a;) := N{x) \ T^'°\x) \ P^^\x) U AW(x); 
x' := BestOf 
X := x'\ 
fc := fc + 1; 
end while 



In the same way, we can adapt the function in (7) to take into account the 
unfeasibility of chance-constraints. For instance, the Pi{x) function for the i-th 
constraint (2) can have the following general form: 



Pi{x) = 1 - Oi - 




Cj < bi 



i = 1, 



(8) 



3 Test Problems 

In order to test the simts-ccp algorithm,we will consider two AflP-hard opti- 
mization problems arising in the design of telecommunication networks based 
on SONET technology. For this class of problems, extensive computational ex- 
periences have been made and efficient tabu search algorithms are available [1]. 




Fig. 1. A SONET network with DXC 
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The SONET network is a collection of rings connecting a set of customer 
sites. Each customer needs to transmit, receive and relay a given traffic with 
a subset of the other customers. Add-Drop Multiplexers (ADM) and Digital 
Cross Connectors (DXC) are the technologies allowing the connection between 
customers and rings. Since they are very expensive, the main concern is to reduce 
the number of DXCs used. 

Two main topologies for the design of a SONET network are available. The 
first topology consists in the assignment of each customer to exactly one ring 
by using one ADM and allowing connection between different rings through a 
unique federal ring composed by one DXC for each connected ring. The objective 
of this problem, say SRAP and depicted in Figure 1, is to minimize the number 
of DXCs. 

Under some distance requirements, a second topology is possible: the use of 
a federal ring is avoided assigning each traffic between two different customers 
to only one ring. In this case, each customer can belong to different rings. The 
objective of this problem, say idp and depicted in Figure 2, is to minimize the 
number of ADMs. For further details see e.g. [2,1,11]. 




Fig. 2. A SONET network without DXC 



In order to formulate these problems, we consider the undirected graph G = 
(V, E): the node set V (jUj = n) contains one node for each customer; the edge 
set E has an edge [m, w] for each pair of customers u, v such that the amount of 
traffic duv between u and v is greater than 0 and duv = dm, Vu, u G V,u^ v. 
Given a subset of edges Ei C E, let V{Ei) C U be the subset of nodes induced 
by Ei, i.e. V{Ei) = {u,v &V ■. [m, w] G Ei}. 
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SRAP Formulation 

Given a partition of V into r subsets Vi, V2, • ■ • K, the corresponding SRAP net- 
work is obtained by defining r local rings, connecting each customer of subset Vi 
to the t-th local ring, and one federal ring, connecting the r local rings by using 
r DXCs. The resulting network uses n ADMs and r DXCs. 

Solving SRAP corresponds to finding the partition Vi , . . . VJ. minimizing r, 
and such that 

^ ^ ^ ^ ^uv ^ t — 1, . . . , r (9^) 

tiSVi v^V 

V^U 

X! X! duv < B (9b) 

i=i j=i+i ueVi veVj 

Constraints (9a) and (9b) impose, respectively, that the capacity bound B is 
satisfied for each local ring and for the federal ring. 



IDP Formulation 

Given a partition of E into r subsets Ei,E2,...Er, the corresponding idp 
network can be obtained by defining r rings and connecting each customer 
of V{Ei) to the i-th ring by means of one ADM. The resulting network uses 
tp = \ V{Ei)\ ADMs and no DXC. 

Solving IDP corresponds to finding the partition E\, . . . E^ minimizing p and 
such that 

duv<B, (10) 

[u,v]£Ei 

Constraints (10) assure that the traffic capacity bound B for each ring is not 
exceeded, we finally remark that idp has always a feasible solution, e.g. the one 
with \E\ rings composed by a single edge. 



Stochastic Formulations 



The stochastic version of SRAP and idp considers the demand duv as random 
parameters. The corresponding chance-constrained programs are obtained by 
replacing constraints (9a) and (9b) with 



^ ^ ^ ^ duv "E B ^1 i — 1, . . . , ^ 



ueVi vev 

V^U 



i—l uGVi v^Vj 



(lla) 



< B 



> 1 - «o 



(11b) 
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for SRAP, and constraints (10) with 



for IDP. 





>!-«*, 



i= 



(12) 



4 The Algorithms for SRAP and IDP 

In [1] the authors proposed a short-term memory TS guided by a variable objec- 
tive function: the main idea of the variable objective function Zy is to lead the 
search within the solution space from unfeasible solutions to feasible ones, as in 
Table 1. 



Table 1. Variable objective function Zv 



feas. sol 
not feas. sol 
where 



SRAP IDP 

feas. sol not feas. sol feas. sol not feas. sol 

k B + BN {k + l)BN feas. sol ip B + M 2tp M 

k B n BN not feas. sol tp B 2ip M 



BN is the maximum value between maximum ring and federal ring capacities 
BN is the value of BN associated to an unfeasible solution 



M is the maximum ring capacity value 

M is the value of M associated to an unfeasible solution 



A diversification strategy is implemented by varying multiple neighborhoods 
during the search. More specifically, dmn uses mainly a neighborhood based on 
moving one customer or demand at a time in such a way that the receiving ring 
does not exceed the bound B. Otherwise, if B is exceeded, we consider also the 
option of switching two customers or demands belonging to two different rings. 
After A consecutive non improving iterations, a second neighborhood is used for 
few moves. During this phase, dmn empties a ring by moving its elements (cus- 
tomers or demands respectively for SRAP and idp) to the other rings disregarding 
the capacity constraint while locally minimizing the objective function. 

To turn the computation of each move efficient, some data structures, repre- 
senting the traffic within and outside a ring, are maintained along the computa- 
tion. 

The whole algorithm is called Diversification by Multiple Neighborhoods, 
say DMN. For a more detailed description of the whole algorithm, refer to the 
description given in [1]. 
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The simts-ccp Algorithms 

In order to devise the simts-ccp algorithms for our problems, we need to im- 
plement the evaluation of chance-constraints through the generation of random 
parameters duv 

(k t) 

As described in section 2, we need T observations of SI ’ to evaluate the 
value of defined in (4). Moreover, each needs the generation of all duv 

traffic demands values according to their probability distribution function. 



~{k) 

Algorithm 3 Computation of Si 
for all t — 1, . . . ,T do 

generate random Vm, v € V,u ^ v, 
for i = 1, . . . , r do compute 

end for 

for i = 1, .. . , r do compute ; 



~(k) 

Algorithm 3 describes, with more details, the evaluation of SI . Since the 
number of traffic demands duv is 0{n^), the complexity of the computation of 
S^^'^ values is 0{T x n^). Note that the complexity of can be reduced 
employing the current traffic values available in the traffic data structures. 

Here we propose two simts-ccp algorithms derived from dmn. The basic 

~(k) 

idea is to add the computation of S^ to the original framework of dmn at 
the end of neighborhood exploration: starting from solution x, the algorithms 
generate each possible move x' as in dmn using the mean value of then, the 
stochastic feasibility (respecting to the chance-constraints) is tested through the 

~(k) 

computation of S') . The algorithms differ in how the test is used to reject, or 
not, solution x' . 

The first one, called dmn-stoch-1, is the simplest one: each move not belong- 
ing to N^^\x), defined in (6), is avoided. In other words, dmn-stoch-1 allows 
only moves which satisfy the chance-constraints. Note that the objective function 
remains the one in Table 1. 



Table 2. Penalized variable objective function 



SRAP 

feas. sol not feas. sol 

feas. sol Zv -I- Bs{x) Zv + BNs{x) 

not feas. sol Zv -I- Bs{x) Zv + BNs{x) 



IDP 

feas. sol not feas. sol 

feas. sol Zv + Bs{x) Zv + Ms{x) 

not feas. sol Zv + Bs{x) Zv + Ms{x) 



On the contrary, the second one, say dmn-stoch-2, allows moves in P^^^{x) 
but penalizes them using an objective function which also measures how unfea- 
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sible the chance-constraints are. Referring to the general form reported in (7) 
and (8), our penalized objective function Zy is depicted in Table 2 where 
and 

= |o otherwise ^ 

5 Preliminary Computational Results 

In this section, we report the planning of computational experiments and the 
preliminary results. 

In our computational experiments, we used the well-known Marsaglia-Zaman 
generator [14,15], say ranmar. This algorithm is the best known random number 
generator available since it passes all of the tests for random number generators. 
The algorithm is a combination of a Fibonacci sequence (with lags of 97 and 33, 
and operation ^^subtraction plus one, modulo one”) and an arithmetic sequence” 
(using subtraction), which gives a period of It is completely portable and 
gives an identical sequence on all machines having at least 24 bit mantissas in 
the floating point representation. 



Table 3. dmn: best computational results. 











gaps 






avg. time in 


ms 


z 


# opt 


# 1 


# 2 


# 


3 


# >3 


optimal 


all 


SRAP 


118 


0 


0 




0 


0 


60.4 


60.4 


IDP 


129 


23 


2 




0 


0 


808.9 


846.4 



Results of the Deterministic Version 

Considering the set of 160 benchmark instances generated by Goldschmidt et 
al. [11], DMN solves to optimality all the 118 instances, which are known to be 
feasible for SRAP, with an average computing time of 60 mseconds. On the same 
benchmark but considering idp, for which the optimal solution value is known 
only for 154 instances, dmn solves 129 instances to optimality, 23 instances with a 
gap of 1 and the remaining two instances with a gap of 2. The average computing 
time is 850 mseconds. The overall results are reported in Table 3. 



Results of the Stochastic Version 

We have tested our algorithms on the same benchmark varying the parameters in 
the following ranges: T G {50,75,100}, at = a £ (0.3, 0.2, 0.1} and maintaining 
unaltered those giving the best result for the deterministic version (see [1]). 
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Table 4. dmn-stdch- 1: best and worst computational results. 





type of 
results 


# opt 




gaps 




avg. time in ms 


# 1 


# 2 


# 3 


# >3 


optimal 


all 


SRAP 


best 


103 


3 


3 


4 


5 


341.4 


397.1 


SRAP 


worst 


99 


3 


8 


1 


7 


412.3 


441.3 


IDP 


best 


105 


27 


8 


4 


10 


3127.3 


3301.5 


IDP 


worst 


101 


25 


10 


3 


15 


3108.2 


3309.9 



The comparisons are made with the optimal value of the deterministic ver- 
sion, that is the values obtained using the mean value of traffic demands. Our 
tests try to investigate how far the solution computed by the simts-ccp is from 
the one computed by its deterministic version. 



Table 5. dmn-stoch- 2; best and worst computational results. 





type of 
results 


# opt 




gaps 




avg. time in ms 


# 1 


# 2 


# 3 


# >3 


optimal 


all 


SRAP 


best 


106 


4 


5 


2 


1 


394.1 


432.7 


SRAP 


worst 


104 


5 


4 


2 


3 


443.5 


457.9 


IDP 


best 


118 


22 


6 


5 


3 


3212.3 


3287.2 


IDP 


worst 


108 


25 


8 


4 


9 


3459.1 


3501.4 



~(k) 

The results, reported in Table 4 and 5, show the impact of S} computa- 
tion: although the increase in the average computation time is quite remarkable 
with respect the deterministic version, we observe that the quality of solutions 
computed by both algorithms is acceptable. 

6 Conclusions and Further Work 

The paper addresses the problem of solving chance-constrained optimization 
problems combining Tabu Search and Simulation. After a brief introduction to 
stochastic programming, the class of simts-cpp algorithms is proposed. The 
reported computational results show that the solutions computed have a quality 
comparable to those computed by the deterministic version. Also the increase in 
the average running time is acceptable. 

Further work will be mainly concerned with two topics. The first one is the 
extension of computational experiments regarding simts-ccp. The second one 
concerns the study of a similar algorithm for SPR problems. 
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Abstract. Clustering can be used to identify groups of similar solutions 
in Multimodal Optimisation. However, a poor clustering quality reduces 
the benefit of this application. The vast majority of clustering methods 
in literature operate by resorting to a priori assumptions about the data, 
such as the number of cluster or cluster radius. Clusters are forced to 
conform to these assumptions, which may not be valid for the considered 
population. The latter can have a huge negative impact on the cluster- 
ing quality. In this paper, we apply a clustering method that does not 
require a priori knowledge. We demonstrate the effectiveness and effi- 
ciency of the method on real and synthetic data sets emulating solutions 
in Multimodal Optimisation problems. 



1 Introduction 

Many real-world optimisation problems, particularly in engineering design, have 
a number of key features in common: the parameters are real numbers; there are 
many of these parameters; and they interact in highly non-linear ways, which 
leads to many local optima in the objective function. These optima represent 
solutions of distinct quality to the presented problem. In Multimodal Optimisa- 
tion, one is interested in finding the global optimum, but also alternative good 
local optima (ie. diverse high quality solutions). There are two main reasons to 
seek for more than one optimum. First, real-world functions do not come with- 
out errors, which distort the fitness landscape. Therefore, global optima may not 
correspond to the true best solution. This uncertainty is usually addressed by 
considering multiple good optima. Also, the best solution represented by a global 
optimum may be impossible to implement from the engineering point of view. In 
this case, an alternative good solution could be considered for implementation. 

Once a suitable search method is available, an ensemble of diverse, high 
quality solutions are obtained. Within this ensemble, there are usually several 
groups of solutions, each group representing a different optimum. In other words, 
the ensemble of solutions is distributed into clusters. Clustering can be defined 
as the partition of a data set (ensemble of solutions) into groups named clusters 
(part of the ensemble associated with an optimum). Data points (individuals) 
within each cluster are similar (or close) to each other while being dissimilar to 
the remaining points in the set. 
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The identification of these clusters of solutions is useful for several reasons. 
First, at the implementation stage, we may want to consider distinct solutions 
instead of implementing similar solutions. This is specially convenient when the 
implementation costs time and money. On the other hand, the uncertainty asso- 
ciated with one solution can be estimated by studying all the similar solutions 
(ie. those in the same cluster). Also, one may want to optimise the obtained 
solutions further. This is done by using a faster (but less good at searching) op- 
timiser on the region defined by all solutions. With the boundaries provided by 
the clustering algorithm, you could do the same but for each cluster (ie. for each 
different high performance region). Lastly, an understanding of the solutions is 
needed. In real world engineering design, for instance, it is common to have 
many variables and a number of objectives. Packham and Parmee [8] claim that 
in this context it is extremely difficult to understand the possible interactions 
between variables and between variables and objectives. These authors pointed 
out the convenience of finding the location and distribution of different High 
Performance regions (ie. regions containing high quality solutions). 

There are many clustering algorithms proposed in the literature (a good 
review has been written by Haldiki et al. [7]). The vast majority of clustering 
algorithms operate by using some sort of a priori assumptions about the data, 
such as the cluster densities, sizes or number of clusters. In these cases, clusters 
are forced to conform to these assumptions, which may not be valid for the 
data set. This can have a huge negative impact on the clustering quality. In 
addition, there is a shortage in the literature of effective clustering methods for 
high dimensional data. This problem is known as the Curse of Dimensionality [6] 
in Clustering. This issue is discussed extensively in [1]. 

In this work, we apply a new algorithm [2] for clustering data sets obtained 
by the application of a search and optimisation method. The method is known as 
CHID ID (Clustering High Dimensional Data) and it is aimed at full dimensional 
clustering (ie. all components have a clustering tendency). To use this algorithm 
it is not necessary to provide a priori knowledge related to the expected clustering 
behaviour (eg. cluster radius or number of clusters). The only input needed is 
the data set to be analysed. The algorithm does contain two tuning parameters. 
However, extensive experiments, not reported in this paper, lead us to believe 
that the performance of the algorithm is largely independent of the values of 
these two parameters, which we believe are suitable for most data sets and are 
used for all our work. CHIDID scales linearly with the number of dimensions 
and clusters. The output is the assigment of data points to clusters (the number 
of clusters is automatically found) . 

The rest of this paper is organised as follows. We begin by describing the 
clustering method in Sect. 2. Section 3 explains how the test data sets are gen- 
erated. The analysis of the results is discussed in Sect. 4. Lastly, Sect. 5 presents 
the conclusions. 
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2 Clustering Method 

We describe our clustering method in three stages. We start by introducing the 
notation used. Next, the clustering criterion is presented as a procedure used to 
find the cluster in which a given data point is included. Finally, the clustering 
algorithm is introduced. The operation of this algorithm is based on the iterative 
application of the clustering criterion until all clusters and outliers are found. 



2.1 Notation 

We regard the population as a collection of N distinct vectors, or data points in a 
M-dimensional space, over which the clustering task is performed. We represent 
this data set as 

n = with a;* G R" Vz G / = {1, . . . , N} . (1) 

Clustering is the process by which a partition is performed on the data set. 
Points belonging to the same cluster are similar to each other, but dissimilar 
to those belonging to other clusters. This process results in C pairwise disjoint 
clusters, whose union is the input data set. That is 

C 

^ = U with (2k = ■ (2) 

k^l 



where Qk is the cluster, Ik contains the indices for the data points included 
in the cluster and 17^ p| 17; = 0 Vfc yf I with k,l € K = {1, .. . , C}. 

An outlier is defined as a data point which is not similar to any other point 
in the data set. Hence, an outlier can be regarded as a cluster formed by a single 
point . 

Proximity (also known as similarity) is the measure used to quantify the 
degree of similarity between data points. A low value of the proximity between 
two points means these points are similar. Conversely, a high value of their 
proximity implies that the points are dissimilar. In this work, the Manhattan 
distance is adopted as the proximity measure. Thus the proximity of x’’ with 
respect to x^ is calculated as 



M 

= ■ (3) 

In practice, the value range of a given component can be very large. This 
would dominate the contribution of the components with much smaller value 
ranges. In this case, we would recommend scaling all components to the same 
range so that every component contributes equally to the proximity measure. 
However, in this paper the scaling will not be necessary since all components 
will be in the same interval. 
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Based on the proximity measure, a criterion is needed to determine if two 
points should be considered members of the same cluster. The quality of the 
clustering directly depends on the choice of criterion. In this work, clusters are 
regarded as regions, in the full dimensional space, which are densely populated 
with data points, and which are surrounded by regions of lower density. The 
clustering criterion must serve to identify these dense regions in the data space. 



2.2 Clustering Criterion 

As we previously described, points belonging to the same cluster have a low 
proximity between them, and high proximity when compared with points be- 
longing to other clusters. The proximity {pu}fLi from an arbitrary point a:* to 
the rest of the data set (ie. with I ^ i) should split into two groups, one group 
of similar low values and the other of significantly higher values. This first group 
corresponds to the cluster to which a;* belongs to. The goal of the clustering 
criterion is to determine the cluster cutoff in an unsupervised manner. The cri- 
terion is also expected to identify outliers as points which are not similar to any 
other point in the data set. 




Fig. 1. Example data set with N= 150, M= 3 and C= 5 



In order to illustrate the operation of the clustering criterion, consider the 
data set shown in Fig. 1. It contains 150 points unevenly distributed in 5 clusters 
in a 3-dimensional space. Let us take a point from one of the clusters (say the 
in the example, which will be referred to as the cluster representative and 
denoted by We apply the following procedure to determine which are its 
cluster members: 

1. Calculate the sequence {Piki}iLi^ using (3), and sort it in increasing order 
to get {di}fL^. The sorting operation is represented as a correspondence 
between sets of point indices, that is, S : I ^ L. The plot of di can be found 
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at the bottom part of Fig. 2. Note that fii = 0 corresponds to d 2 is 

the nearest neighbour to a:®'” and so on. 

2. Define the relative linear density up to the data point as 9i = 

compute {0;}/^]^. This expression is equivalent to the cumulative number of 
points divided by an estimation of the cluster size relative to the same ratio 
applied over all N points. /? is a parameter that will be discussed later (we 
recommend that (3 = |d2)- 

3. Define the relative linear density increment for the point as 
A9[ = \9[ — 9i-i\ and calculate the sequence {A0/}(^2- The plot of A9i is 
presented at the top part of Fig. 2. If A0/ is high then it will be unlikely 
that the point forms part of the cluster. 

4. Calculate h = {I € {2,... ,N} \ A9i = max{{A9i'}jf^2)} (i®- the position 
of the highest value of A9i in Fig. 2). Define (Zi — 1) as the provisional 
cluster cutoff, and it means that only the Zi — 2 closer points to the cluster 
representative would be admitted as members of the cluster. 

5. Define the significant peaks in {A0/}(^2 those above the mean by a times 
the standard deviation, ie. {I £ {2, , N} \ A9i > A9i + a (Ja 9 i}- a is a 
parameter that will be discussed later (we recommend a = 5). 

6. Identify the significant peak with the lowest value of I, that is, the most left 
significant peak and take it as the definitive cluster cutoff Ic. In Fig. 2 (top), 
the example shows two significant peaks, which are above the horizontal 
dotted line given by A9i + a oaBi ■ If all the peaks are below A9i + a (taSi, 
then take the highest as the cluster cutoff, ie. lc = h — 1- 

7. Finally, the cluster is given by Lk = {1, ..., lc}k- Invert the sorting oper- 
ation (/fc = S~^{Lk)) to recover the original point indices which define the 

cluster as 17^ = 

Under the A9i representation pictured in Fig. 2 (top), the natural cluster to 
which the cluster representative I = 1 belongs becomes distinguishable as the 
data points to the left of the most left significant peak. As has been previously 
discussed, cluster members share a similar value of di among themselves and 
are dissimilar when compared to non- members. As a consequence, the sequence 
{A9i}\'L2 contain low values, whereas A0;^_i_i is higher in comparison. Neverthe- 
less, A0j^+i is not necessarily the highest in the sequence {A6*;}(^2 therefore 
Step 6 is required to localise A0;_,+i as the most inner peak. A decrease in the 
value of a will result in clusters with a lower proximity between them, in other 
words, more restrictive clusters. In the light of these considerations, a can be 
regarded as a threshold for the clustering resolution. Despite the fact that a 
plays a relevant role in data sets containing few points and dimensions, as the 
number of points and dimensions increase the plot of A9\ tend to show a single 
sharp peak corresponding to the cluster cutoff. It suffices to take any high value 
of a. 

Another issue is the role of (3 in the definition of 9i at Step 2. d/ is a better 
estimation of the cluster diameter than (3 + di. However, the latter is preferred 
since with the former neither 9\ nor A 02 would be defined due to the zero value 
of d\ . These quantities lack meaning in the sense that a density cannot be defined 
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<] 0.4 



+ 1 



150 



h - 1 



Fig. 2. Characteristic {A0;}^2 (fop) and {di}fLi (bottom, represented by ‘+’ signs) 
plots for a given point in the example data set 



for only one point. Without A 6 * 2 , it would not be possible to find out whether 
the cluster representative is an outlier or not. A suitable value of j3 will vary 
depending on the cluster and hence it is preferable to link it to an estimation 
of the proximity between points in the cluster. We choose to fix /3 = 7^2 with 
0 < 7 < 1 (a high 7 implies a higher tendency to include outliers in clusters). 
In appendix A, 7 is shown to be a threshold for outlier detection. Note that an 
additional advantage of posing (3 in this form is that it becomes negligible with 
respect to high values of di and hence ensures the density effect of 9i. We set 
7=1 throughout this work. 

Finally, the detection of outliers in very high dimensional spaces is a diffi- 
cult task that requires a preliminary test in order to ensure the effectiveness of 
the outlier detection for every value of 7 . If this additional test, placed before 
Step 2, concludes that the cluster representative is an outlier, that is, Ic = 3 then 
Steps 2 to 6 are skipped. This test consists in checking whether the proximity 
from the cluster representative to the nearest neighbour d,2 is the highest of all 
consecutive differences in di, calculated as Adi = \di — di-i\. Note 

that this constitutes a sufficient condition for the cluster representative to be 
an outlier since it implies that the considered point is not similar to its nearest 
neighbour. This issue is explained further in [1]. 



2.3 Clustering Algorithm 

In the previous section we have presented a criterion to determine the cluster to 
which a given data point belongs. We describe now an algorithm based on the 
designed criterion to carry out the clustering of the whole data set, whose flow 
diagram is presented in Fig. 3. 
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Fig. 3. Flow diagram of CHIDID 



1. Randomly choose a point, a;**’, from the data set, that has not been allocated 
to a cluster. 

2. Apply the clustering criterion over the whole data set to determine the cluster 
associated to a:*'" . 

3. In case that the proposed cluster, Ik, contains any points already included 
in any of the k — 1 previous clusters, delete them from the proposed cluster. 
The points form the definitive cluster given by the index vector Ik ■ 

4. Form the non-allocated index vector I„a = I \ (Ufc'=i ^k')- If Ina 0, pass 
to next cluster by randomly choosing a point ik+i^k from Ina and go back to 
Step 2. Otherwise stop the algorithm and return {Ik}k=iJ which constitutes 
the solution described in (2). 

Let us go back to the example in Fig. 1 to illustrate the algorithm. Step 1 
picks at random a point to initiate the clustering process. Apply Step 2 to find 
the associated cluster to the selected point. Step 3 deletes no points. In Step 4, 
Ina yf 0, since Ina Contains points corresponding to the four clusters yet to be 
discovered. Therefore another point is selected at random among those in Ina- 
We repeat Steps 1, 2 and 3, but this time Step 3 checks whether the actual 
cluster intersects with the previous one or not, after which the second cluster is 
formed. The latter procedure is repeated until all points have been allocated to 
their respective clusters. 
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As clusters can be arbitrarily close, it is possible to have points from different 
clusters within a proximity di^ from the cluster representative. In such cases, 
the algorithm is likely to merge all these natural clusters together in a single 
output cluster. The aim of Step 3 is to ensure that formed clusters are pairwise 
disjoint. This is achieved by removing from the current cluster those points 
already allocated in previously found clusters. However, it must be highlighted 
that a merge is a very unlikely event. Firstly, a merge can only happen if the 
representative is located in the cluster region closest to the neighbouring cluster. 
Even in low dimensions, few points have this relative location within a cluster. 
Therefore, the likelihood of selecting one of these problematic representatives 
is also low. Secondly, if neighbouring clusters present a significant difference 
in the value of just one component then a merge cannot occur. Thus, while it 
is strictly possible, the likelihood of having a merge quickly diminishes as the 
dimensionality increases. 

CHID ID is a method aiming at full dimensional clustering (ie. all components 
exhibit clustering tendency) . It is expected to be able to handle data containing 
a few components without clustering tendency. However, the contribution of 
these irrelevant components to the proximity measure must be much smaller 
than that of the components with clustering tendency. In this work, we restrict 
the performance tests to data sets without irrelevant components. This issue will 
be discussed further when generating the test data sets in Sect. 3.2. 

Since a and 7 are just thresholds, the only input to the algorithm is the data 
set containing the population. No a priori knowledge is required. The output 
is the assignment of data points (solutions) to clusters (optima or high perfor- 
mance regions), whose number is automatically found. The performance of the 
algorithm does not rely on finding suitable values for the parameters. A wide 
range of threshold values will provide the same clustering results if clusters and 
outliers are clearly identificable. The algorithm naturally identifies outliers as 
byproduct of its execution. Within this approach, an outlier is found whenever 
the additional outlier test is positive or the criterion determines that the cluster 
representative is actually the only member of the cluster. 

CHID ID carries an inexpensive computational burden. It is expected to run 
quickly, even with large high dimensional data sets. There are several reasons for 
this behaviour. First, the algorithm only visits populated regions. This consti- 
tutes a significant advantage in high dimensional spaces. Second, it scales linearly 
with M, the dimensionality of the space. Finally, it only calculates C times N 
distances between points, where C is the number of found clusters and outliers. 



3 Generation of Test Data Sets 

3.1 Synthetic Data Sets 

These data sets emulate the output of an ideal search and optimisation method 
on a multimodal optimisation problem. The assumed structure is most of data 
points distributed into clusters, with a small subset points as outliers. 
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The generation of these synthetic data sets is beneficial for two reasons. 
First, emulated outputs can be designed with very diverse characteristics. This 
is important because, in practice, there is not a unique form for these outputs. 
Second, it allows us to test the clustering algorithm with data sets of higher 
dimensionality than those achievable by current search methods. 

It is reasonable to assume that each cluster of solutions is confined within the 
basin of attraction of a different function minimum (we restrict to minimisation 
without loss of generality). The Rastrigin function is suitable to generate such 
synthetic outputs because the location and extent of its minima is known. It is 
defined as 



M 

f{x) = MA + — Acos(27ra;j)) . (4) 

j=i 

This function is highly multimodal. It has a local minimum at any point with 
every component Xj taking an integer value kj and a single global minimum, at 
X = 0 . The function has a parabolic term and sinusoidal term. The parameter 
A controls the relative importance between them (we set A = 10). 

We generate the first synthetic data set (SDl) as follows. It contains W = 100 
data points distributed into clusters, and No = 10 outliers with each component 
selected at random within [—2.5, 2.5]. The total number of points is IV = N^ + No 
and the number of variables is M = 20. Points are distributed among five clusters 
in the following way: = 0.4iVc, N 2 = = 0.2Nc and = OANc- 

Each cluster is associated with a minimum, whose components are picked at 
random among Cj € {—2, —1, 0, 1, 2}. The basin of attraction of the minimum 
is defined by the interval [c^ — 0.25, Cj + 0.25]. The size of the interval is given 
by half the distance between consecutive function minima. Point components 
in each cluster are generated at random within the basin of attraction of the 
corresponding minimum. 

In addition, we want to consider data sets with very diverse characteristics to 
check the performance robustness of the clustering method. We start by describ- 
ing an automatic way to construct data sets. This is based on the previous data 
set, but with two variants. First, each point component is now generated within 
[dj — Ax,c^ + Ax], where Ax is a random number between (0,1/2). Second, 
the number of points of each cluster is now determined as follows. We first 
determine r^, which controls the proportion of points in cluster k, is drawn 
from a uniform distribution in the range (1,5). Next, we determine the number 
of points Nk in cluster k using the formula = NoVk/ Vk', where C is the 

number of generated clusters. Note that each of these data sets contain clusters 
of different sizes, cardinalities, densities and relative distances between them. 

Finally, we define eight groups of data sets from all possible combinations of 
Ac = 100, 1000; M = 10, 100 and C = 3,7. For all of them we add No = 10 
randomly generated outliers within [—2.5, 2.5]. For each group {Nc, M,C), we 
create ten realisations of the data set. We will refer to these data sets groups as 
SD2 to SD9. All these data sets are summarised in Table 1. 
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Table 1. Test data sets. Nc is number of points in clusters, M is the dimensionality 
and C is the number of clusters. In the real data sets, RDl and RD2, C is unknown. 



Set 


SDl 


SD2 


SD3 


SD4 


SD5 


SD6 


SD7 


SD8 


SD9 


RDl 


RD2 


N, 


100 


100 


100 


100 


100 


1000 


1000 


1000 


1000 


177 


203 


M 


20 


10 


100 


10 


100 


10 


100 


10 


100 


5 


7 


C 


5 


3 


3 


7 


7 


3 


7 


3 


7 


? 


? 



3.2 Real Data Sets 

We generate two real data sets (RDl and RD2) by applying a search method on 
a analytical function with four global minimum. This function is used in [5] and 
defined as 



f{x)=A 1 



Cg 




) 



+ B 



d{x,b{xi)) 

Vm 



( 5 ) 



with 



= 0.5 



2 if A: = 1 

5 if A: = 2 

7 if A: = 3 

9 if A: = 4 



b{xi) 



2 if 0 < xi < 4 
5 if 4 < xi < 6 
7 if 6 < xi < 8.5 
9 if 8.5 < a;i < 10 



where d(-,-) is the euclidean distance, Cq = 4 (number of global minima), M 
is the dimensionality, A = 5 and i? = 1. As a search method, we use a Real- 
parameter Genetic Algorithm (GA) [3] [4] that has been modified [5] to be 
effective at finding multiple good minima. In brief the details are: a steady-state 
population of size N, parents are selected randomly (without reference to their 
fitness), crossover is performed using a self-adaptative parent-centric operator, 
and culling is carried out using a form of probabilistic tournament replacement 
involving NREP individuals. 

The output of this GA is formed with all individuals that enter the population 
during the run. Thus, many of them are not optimised. From the clustering 
point of view, these points contain many irrelevant components which harm the 
clustering. Inevitably, one cannot find clusters where there are not clusters to 
find. Therefore, a preprocessing is needed to include most of the points with 
clustering tendency. The preprocessing consists in defining a threshold value for 
the objective function and include in the data set all points in the outputs below 
that threshold. The chosen threshold is f{x) < 1. 

The first data set, RDl, is obtained by applying the GA (N= 100 and 
NREP= 30, using 40, 000 function evaluations) on the 5-dimensional instance of 
the function. RD2 comes from the application of GA (N= 150 and NREP= 40, 
using 200, 000 function evaluations) on the 7-dimensional instance of the func- 
tion. 
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4 Analysis of the Results 



In the experiments performed in this study, we set the resolution of the clustering 
as q; = 5 and the threshold for outlier detection as 7 = We required only 
one algorithm run for each data set to provide the presented results. These 
experiments were performed in an Intel Pentium IV I.5GHz, with memory of 
256MB. 

Firstly, the clustering was performed on all the synthetic data sets described 
in the previous section (SDl to SD9). The correct clustering was achieved for 
all of them, meaning that the correct assignment of individuals to clusters was 
found. Outliers were also identified correctly. No merge, as defined in Sect. 2.3, 
occurred. Also, due to their linkage to cluster properties, a single choice of pa- 
rameters was valid for all experiments. The high variety of test data sets demon- 
strates the robustness of the method. The highest CPU time used was 0.8 seconds 
(with SD9). 

Figure 4 provides a visualisation of the obtained clustering quality with SDl. 
The five clusters in the data set are correctly revealed by CHID ID, which has 
correctly assigned data points to clusters as well as discard the outliers. 




Fig. 4. Results for SDl. The upper plot shows the input data set without outliers, with 
colors corresponding to the component value. The bottom plot shows the five clusters 
found with CHIDID (outliers were correctly identified and are not inclnded) 



Likewise, Fig. 5 shows equally good performance on SD5, a data set with 
the same number of points as SDl but a much higher dimensionality (M= 100 
instead of M= 20). 




Components Components 
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Fig. 5. Results for SD5. The upper plot shows the input data set without outliers, 
with colors corresponding to the component value. The bottom plot shows the seven 
clusters found with CHIDID (outliers were correctly identified and are not included) 
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Finally, we carry out clustering on the real data sets. Figures 6 and 7, respec- 
tively. In both cases, four clusters were found, each of them associated with a 
different global optimum. The quality of the clustering is clear from the figures. 





Fig. 6. Results for RDl. The upper plot shows the input data set without outliers, with 
colors corresponding to the component value. The bottom plot shows the four clusters 
found with CHIDID (outliers were correctly discriminated and are not included). The 
horizontal line underneath the plot marks the separation between clusters 



5 Conclusions 

This paper proposes a new algorithm (CHIDID) to identify clusters of solutions 
in Multimodal Optimisation. To use this algorithm it is not necessary to provide 
a priori knowledge related to the expected clustering behaviour (eg. cluster ra- 
dius or number of clusters). The only input needed is the data set to be analysed. 
The algorithm does contain two tuning parameters. However, extensive experi- 
ments, not reported in this paper, lead us to believe that the performance of the 
algorithm is largely independent of the values of these two parameters, which we 
believe are suitable for most data sets and are used in this work. The output is 
the assignment of data points to clusters, whose number is found automatically. 
Outliers are identified as a byproduct of the algorithm execution. 

CHIDID was tested with a variety of data sets. Synthetic data sets were used 
to study the robustness of the algorithm to different number of points, variables 
and clusters. Also, real data sets, obtained by applying a GA on an analytical 
function, were considered. CHIDID has been shown to be efficient and effective 
at clustering all these data sets. 
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Fig. 7. Results for RD2. The upper plot shows the input data set without outliers, with 
colors corresponding to the component value. The bottom plot shows the four clusters 
found with CHIDID (outliers were correctly discriminated and are not included). The 
horizontal line underneath the plot marks the separation between clusters 
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A Outlier Detection 



In Sect. 2.2, we introduced the parameter 7 as a threshold aimed at discriminat- 
ing outliers. We chose to fix /3 = 7^2 with 0 < 7 < 1. In this section, we justify 
these settings. 

The relative linear density was defined as 61 = ^ ■ Note that in case 

7 > 1 , /3 might be greater than di and hence the density might be notably dis- 
torted. Thus, 7 is restricted to the interval (0,1) to ensure the accuracy of the 
density measure. 

Next, we develop the relative linear density increment as 



A9i = \ei-9i.i\ 



P d]\[ I I — 1 

N P + di P + di-i 



( 6 ) 



and evaluate the resulting expression for its first value 



A02 



f3 + djsf 2 1 

TV (3 -\- d 2 /? 



(7) 



From the latter equation, if /3 — >■ 0 (ie. 7 — >■ 0) then A 6*2 — >■ -l-oo and thus 
A 02 will be the highest of all {A 0 ;}(^ 2 - This would make the representative to 
appear always as an outlier. By contrast, if /3 — >■ ^2 (ie. 7 — >■ 1) then A 02 — >■ 0 
and thus A 02 will be the lowest of all {A 0 /}(^ 2 - This situation corresponds to 
the representative being never an outlier. 

We next substitute P = 7^2 in (7) to give 



A92 



7^2 + d]v 

Nd2 



fil), with /( 7 ) = 



1-7 

(1 + 7)7 



(8) 



Since 0 < 7 < 1, it is clear that /{j) controls the tendency of regarding a 
representative as an outlier. Figure 8 presents the plot of /(y) against 7 . The 
figure shows that a higher value of 7 implies a higher tendency to include outliers 
in clusters. 




Fig. 8. Plot of /(y) against 7 
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Abstract. In this paper we present a set of algorithms which represent 
different implementations of strategies for revenue management for air 
cargo airlines. The problem is to decide on the acceptance of a booking 
request given a set of hxed accepted requests and expected further de- 
mand. The strategies are based on a planning model for air cargo routing 
which determines the maximal contribution to proht for the capacities of 
a given flight schedule. This planning model associates an optimal set of 
so-called itineraries/paths to the set of possible origin-destination pairs 
according to given yield values. In mathematical terms the model is the 
path flow formulation of a special multi-commodity flow problem. This 
model is solved by intelligently applying column generation. 



1 Introduction 

Revenue Management (RM) deals with the problem of effectively using perish- 
able resources or products in businesses or markets with high fixed cost and low 
margins which are price-segmentable like airline, hotel, car rental, broadcasting 
etc. (see Cross [1997] and Me Gill et. al [1999]. Revenue management has to be 
supported by forecasting systems and optimization systems. In this paper we 
focus solely on the optimization aspect. 

RM has its origin and has found broad application in the airline business and 
here especially in passenger flight revenue management. Only recently the con- 
cepts which have been developed for the passenger sector have been adequately 
modified and transferred to the cargo sector. Here an immediate transfer is not 
feasible since the two businesses although at first sight offering a similar kind 
of transportation service face differences in product offering and in production 
which have consequences for capacity planning and revenue management. While 
the demand in the passenger sector is one-dimensional (seat) and kind of smooth, 
the demand in the air cargo sector is multi-dimensional (weight, volume, classi- 
fication) and lumpy. On the other side, capacity in the airline business is fixed 
while in the cargo business we face stochastic capacity when using idle belly 
capacities of passenger flights for instance. 

Yet, the most subtle difference is as follows. In the passenger business the 
service which customers are booking is a concrete itinerary i.e. a sequence of 
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flight connections, so-called legs leading the passenger from an origin to a des- 
tination. Thus a passenger booking a flight from Cologne to Rio de Janeiro via 
Frankfurt on a given date has to be given seat availability on two specific flights: 
(CGN,FRA) and (FRA, GIG). In the cargo business the customers book a trans- 
portation capacity from an origin to a destination, so-called O&D’s at a certain 
service level i.e. within a certain time-window and the airline has some degree 
of freedom to assign booked requests to concrete flights later. For the example 
above the cargo client would book the connection (CGN,GIG) and he is not 
especially interested in the actual path or routing for his package of goods. This 
difference leads to different planning models. 

Antes et. al. [1998] have developed a model and introduced a decision support 
system for evaluating alternative schedules by optimally assigning demand to 
flights. The model leads to a special kind of multi-commodity flow problem. 

In this paper we show and analyze how this model can be adapted to aid 
in solving the operational problems faced in revenue management. The paper 
is organized as follows. In section 2 we introduce the problem and the model 
for transportation planning in the air cargo business and in section 3 we show 
how this model can be solved by column generation. In section 4 we describe 
how this model can be extended to the problem of revenue management from a 
conceptual point of view. In section 5 we propose several alternative strategies 
for implementing the model into the revenue management process and in section 
6 we present first computational results for realistic problem scenarios. 



2 The Basic Planning Model for Air Cargo Routing 

A cargo airline offers the conceptually simple service to transport a certain 
amount of goods from an origin to a destination within a certain time inter- 
val at a given price. For this purpose the airline keeps a fleet of aircrafts and/or 
leases capacities from other companies, especially ground capacities for trans- 
porting goods to, from, and between airports. The tactical planning problem of 
such airlines is to design a (weekly) flight schedule which allows the most prof- 
itable service of the unknown market demand for the next period. Then, on the 
operational level the requests have to be assigned to concrete flights based on 
the given flight schedule. 

From the verbal description of an air-cargo service given in the introduction 
we can identify three basic entity types in our world of discourse: AIRPORT, 
PRODUGT the class of all conceptually different ’’product types”, and, TIME 
being a tuple of three components: day of the week, hour and minute. 

The flight schedule can be described as a set of so-called legs. Here a leg is 
a direct flight between two airports offered at the same time every week during 
the planning period. We model the object class LEG as a 4-ary (recursive) 
relationship-type between AIRPORT playing the role of an origin (from) and 
the destination (to), respectively, and TIME playing the role of a departure and 
arrival time, respectively. 
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Every flight schedule defines a so-called time-space network G = (V,E) with 
V the set of nodes representing the airports at a certain point of time. E is 
composed from two subsets, a set of flight arcs which represent the legs and 
connect the associated airports at departure and arrival time, respectively, and 
a set of ground arcs which connect two nodes representing the same airport at 
two consecutive points in time. Associated with every leg/arc e G E is the weight 
capacity Ue measured in kg, the volume capacity Ve measured in cubic meter and 
the operating cost Cg measured in in $. 

Market-demand can be described as a set of estimated or booked service 
requests. Such a request has several attributes with the origin and destination 
airport being the dominant characteristic. Therefore, these request objects are 
commonly referred to as ”0&D’s” or ”0&D-pairs”. Conceptually the object class 
OD can be modelled as a complex 5-ary (recursive) relationship-type between 
AIRPORT playing the role of an origin and a destination, respectively, TIME 
playing the role of an availability and due time, respectively, and PRODUCT. 

Then within the concept of network flow, the different O&D’s define differ- 
ent commodities which have to be routed through this network (see Ahuja et. 
al. [1993]). For the following models we abstract in our notation and use the 
symbol k for representing an O&D-commodity and K is the set of all commodi- 
ties/O&D’s. Associated with a commodity/O&D k G K is a specific demand 6^ 
measured in kg, a value giving the volume per kg and a freight rate or yield 
value y^ measured in $ per unit of commodity k. 

The so-called path-flow model of the multi-commodity flow problem, i.e. the 
problem to construct optimal assignments of O&D’s/commodities to legs/ arcs 
is based on the obvious fact that any unit transported from an origin to a 
destination has taken a sequence of legs (possibly only one leg) a so-called path 
or itinerary connecting the origin node with the destination node in the network. 
A path or itinerary for an O&D/commodity k is a sequence p = (li, . . . , lr(p)) of 
legs li G LEG, r{p) > 1 with the following properties 

origin(fi) = originfk) 

destinationfli) = origin{li+i) i = 1, . . . , r{p) — 1 
destination{r{p)) = destination(k) 

A path p is called k-feasihle if additional requirements are fulfilled which vary 
with the problem definition. Here we consider several types of constraints which 
concern due dates, transfer times and product compatibility. Note that feasibility 
of paths is checked outside the decision model. For every O&D/commodity k 
we denote by the set of fc-feasible itineraries. A path may be feasible for 
many different O&D’s. In our model we have to distinguish these roles and 
consider multiple copies of the same path/the same legs assigned to different 
commodities/O&D’s. The relation between arcs and itineraries is represented in 
a binary indicator 6 : L x S ^ {0,1} 



6{l,p) : 



1 if leg 1 is contained in path p 
0 else 
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Given an itinerary p G we can easily calculate c^(p) := ^ c^5e{p) = X) 

eS-E eSp 

the operating cost as well as y^{p) := y* — X the yield per kg of commodity 

eSp 

k which is transported over p. This calculation as well as the construction of the 
set is done outside our model using a so-called ’’connection builder” and then 
fed as input data into the model. For the model we introduce for every p G 
a decision-variable f{p) giving the amount (in kg) transported via p. 

Now the planning problem is to select the optimal combination of paths 
giving maximal contribution to profit which leads to a standard linear (multi- 
commodity flow) program. 

Planning model (P) 



k£K pfe 


(1) 


X X ^^(p)f(p) ^ yeG E 

k^K p^P^ 


(2) 


X X '^e(p)d'"/(p) <Ve ye G E 

k^K p^P^ 


(3) 


Hp) < ykGK 

P^pk 


(4) 


f{p) >0 ykGK ypGP’^ 


(5) 



The advantage of the itinerary-based model over leg-based flow models is the 
possibility to consider rather general and complicated constraints for feasibility 
of transportation in the path-construction phase via the connection builder, i.e. 
keeping this knowledge away from the optimization model, thereby reducing the 
complexity of the optimization phase respectively, allowing the same standard 
(LP-)solution procedure for a wider range of different planning situations. 

Moreover, this approach allows for scaling, i.e. it is not necessary to construct 
all possible paths beforehand. Working with a ’’promising subset” of profitable 
paths only, reduces the size of the problem instance but may lead to solutions 
which although not optimal in general, are highly acceptable in quality. Finally, 
an approach called column generation allows to generate feasible paths on the 
run during optimization and thus keeps problem size manageable throughout 
the optimization process. 



3 Solving the Planning Model via Column Generation 

Column generation goes back to Dantzig and Wolfe [1960] as an approach for 
solving large linear programs with decomposable structures. It has become the 
leading optimization technique for solving huge constrained routing and schedul- 
ing problems (see Desrochers et al. [1995]). Due to degeneracy problems column 
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generation often shows unsatisfactory convergence. Recently, so-called stabiliza- 
tion approaches have been introduced and studied which accelerate convergence 
(see du Merle et. al. [1999]. We did not adapt these advanced techniques since 
our objective is not to solve single static instances of a hard optimization prob- 
lem to near-optimality in reasonable time, but to analyze whether and how the 
concepts which have shown to be appropriate for solving air-cargo network de- 
sign and analysis problems on a tactical level can be applied on the operative in 
a dynamic and uncertain environment. The suitability and use of such a common 
modelling concept is necessary with respect to consistency of network capacity 
planing and revenue control. 

In the following we will only outline the column generation approach for solv- 
ing (P). Let us denote this problem by MP, which stands for master problem. 
Now, instead of generating all feasible itineraries p G P^, k G K and thus initial- 
izing MP only a promising subset of itineraries , k G K is constructed in the 
initialization phase. Such a subset can be obtained for instance by construct- 
ing and introducing for every O&D the m feasible itineraries with maximum 
contribution to profit, where m is a relatively small number. 

Since in the model every path or itinerary constitutes a decision variable and 
a column in the LP we have generated with this initialisation for every O&D 
a subset of columns associated with a subset of the set of all decision variables 
defining a subproblem the so-called restricted master problem (RMP). RMP is 
again of the (P)- type and thus can be solved by any standard LP-technique, 
the simplex method for instance. Solving RMP will generate a feasible freight 
flow, since every feasible solution to a restricted problem is also feasible for the 
original master problem. 

Now the question arises whether the optimal solution for RMP is optimal for 
MP, too. Solving MP, by the simplex method for instance, we obtain for every 
arc/leg e a shadow price We for the weight constraint and Ze for the volume 
constraint. The demand constraint associates with every commodity k a shadow 
price cr^. Based on these prices the so-called ’’reduced cost coefficient” for an 
itinerary p G is defined as 

r{p) = y'' -'^{ce + We + df^Ze) - ( 6 ) 

eGp 

LP-theory states that in the maximization case a feasible (basic) solution for an 
LP is optimal if and only if every non-basic variable has non-positive reduced 
cost. Thus, if in an optimal solution for RMP an itinerary p G P^ has positive 
reduced cost r{p) then transporting (one unit of) commodity k on p leads to a 
solution for MP which is more profitable than the current optimal solution for 
RMP. 

Thus in the second phase we calculate the reduced cost values for (all) p G 
P^\R^ and we check whether exists p G P^ for a commodity k with > 

^ (ce + We + d^Ze), a so-callcd promising path. This phase is called outpricing. 

eGp 

A common and efficient approach for the outpricing phase is by solving a 
shortest path problem for each k G K i.e. to determine the shortest path from 
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the origin of k to the destination of k with respect to the modified arc-costs 
(ce + We + d^Ze). If the shortest path length exceeds then no promising 

path exists. If this holds for all k € K, this indicates that the optimal solution 
to RMP is optimal for MP. Otherwise we define additional variables for all or 
some itineraries for which r(p) > 0 holds and generate the associated columns 
to be introduced into RMP, which then has to be (re-) solved again. 

The column generation concept which we have described above leaves a great 
variety of strategies on how to implement the different phases. The basic philos- 
ophy of column generation is to keep the restricted master problem RMP rather 
small to speed up the LP-solution. Yet, keeping RMP small requires to test more 
columns during the out pricing phase. Thus there is a trade-off which has to be 
evaluated by testing several options. 

Accordingly, two different algorithms T1 and T2 were implemented to ex- 
amine the trade-off between calls to outpricing and calls LP-(re-)optimition. In 
Tl, RMP is re-solved and the dual prices are updated every time a new path is 
found while in T2 RMP is re-solved and the dual prices are updated only after 
the path search has been executed for all commodities. 

Another important aspect of applying the path-flow formulation and column 
generation concept to the problem of generating optimal freight flows is the 
fact that all constraints on od-feasibility can be checked in the outpricing phase 
and thus need not be represented in the LP-formulation and tested during LP- 
solution. This does not only reduce the complexity of the LP, but enables the 
consideration of arbitrary constraints on the feasibility of itineraries and allows 
the application of one model to a variety of problem types characterised by dif- 
ferent sets of constraints. Note that when the feasibility of itineraries is restricted 
by constraints, so-called constrained-shortest path problems CSPP have to be 
solved (see Nickel [2000]). 

4 Operational Models for Air Cargo Booking and 
Revenue Management 

The planning model can be used to evaluate alternative schedules, to identify 
capacity bottlenecks, to support static pricing etc. In an operational environ- 
ment given a flight schedule and pricing information booking requests will occur 
over time and decisions have to be made concerning the acception or rejection 
of requests due to limited resources and/or (expected) opportunity costs and 
appropriate capacity reservations have to be made. Yet, there is no need for 
the airline to actually book a certain request to a specific itinerary i.e. set of 
legs. The airline has to ensure only that at any time there exists a feasible rout- 
ing for all booked requests. This aspect is represented in the following booking 
model where we distinguish between a set B of booked O&D- requests with 
f3°‘ the demand of a booked request a G B, and a forecasted demand for a 
set K of O&D’s. W.l.o.g. we can assume that K is the set of all O&D’s. Then 
the model determines the maximal contribution to profit subject to the set of 
accepted/booked requests and the additional expected demand. 
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Booking model 
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(11) 


f{p) >0 ykGK\jB ypGP'^ 


(12) 


The next model represents the situation in revenue management where the 
decision on the acceptance of a single booking request r with demand /3” for 
a specific O&D subject to a set B of already booked requests and estimated 
further demand for the commodities K has to be made. 


Request model 
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(19) 



Note that for both models we assume an (expected) yield associated with 
every commodity k G K U B. This yield has been used to calculate the yield of 
the paths p G and thus is contained in the model only implicitly. 

The booking model can be used to evaluate the acceptability of a request at 
a pre-specified yield as well as ’’dynamic pricing”, i.e. the determination of the 
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minimum acceptable yield. Conceptually, the first question can be answered by 
comparing the value of the optimal solution of the booking model without the 
actual request and the value of the optimal solution for the request model. We 
will present several more efficient algorithmic ideas based on computing promis- 
ing paths in the sequel. Dynamic Pricing can be supported by modifying the 
procedure for solving the booking model. Here paths with increasing opportu- 
nity costs are constructed sequentially until a set of paths with sufficient capacity 
for fulfilling the additional demand is constructed. Then the opportunity costs 
of these path give the minimum acceptable yield, a concept which is often called 
’’bid-price”. We do not address aspects of dynamic pricing in this paper. 



5 Strategies for Applying the Request Model in Revenue 
Management 

When incorporating this modelling approach into a (real) revenue management 
process the specification of expected demand is essential. For this purpose the 
optimization module has to be integrated with a forecasting module where cali- 
brated forecasting methods update the demand for O&D’s. Thus we assume for 
the following that such a forecasting module triggers the update of and for 
k G K in the booking model which then is resolved by the optimization module. 
This computation is done outside the revenue management process which we 
describe in the following. Thus for our algorithms we assume that we always 
work on the optimal solution for a given booking model. 

Now, there is one problem with applying and simulating a procedure for 
handling a series of booking requests. In a realistic environment the forecasted 
demand would be updated over time through the forecasting module, thus chang- 
ing the booking model, its optimal freight flow and its optimal dual prices. Yet, 
there will always exist a feasible solution with respect to the booked requests. 
Associated with a request r is a commodity k for which a certain forecast is 
contained in the booking model, and thus (3^ is “contained” in the forecast b^ . 
Thus, after processing request r the expected demand for commodity k has to 
be adjusted especially if the demand of requests is kind of lumpy. In our exper- 
iments we take this into account by reducing the demand b^ by (3^ for the next 
requests. 

In the following we first discuss the basic applications of the request model 
with one additional request only and we assume that this request has to be 
evaluated in concurrency with booked requests and expected further demand. 
Such a request is characterized by the following data ( r , /3”, y”). The basic 
algorithmic idea for (on-line/real-time) processing is as follows: Start from an 
optimal primal and dual solution of the booking model without the actual request 
r, called the master model MP, and sequentially determine feasible paths of 
maximal reduced cost until the demand of the request is fulfilled. 
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Algorithm SI 

let / be the optimal primal solution of the MP 
while f(p) > 0 // i-®- demand of request r is not fulfilled 

peP'' 
do 

determine p the r-feasible path with maximal reduced cost 
if the reduced cost is < 0 then: reiect request r and restore the old MP, 
STOP, 
else 

introduce p into the master problem MP 
(re-)solve MP and read new solution / 
end if 
end do 

accept the request and update the booking model by introducing the request 
into B: let B := B U {r} and introduce an equality constraint for r 



This procedure does not always yield the correct answer. A request r which 
improves the expected contribution to profit may be rejected since SI tries to ship 
the complete demand /3’’over itineraries with positive reduced costs only. Yet, 
using some itineraries with negative reduced costs could lead to an improvement 
with respect to total contribution. 

In modification S2 we do not compute the path(s) for serving request r 
” from scratch” based on the primal and dual solution of MP. Here we make use 
of the fact, that we may serve already forecasted demand for the same O&D k 
in the optimal solution of MP and we use the associated itineraries for serving 
request r . Only in the case that these path capacities are insufficient we would 
compute additional shortest paths as in SI. Yet, there is one problem that has 
to be handled appropriately. The itineraries in the optimal MP-solution which 
are associated with a forecasted demand for a commodity k have been selected 
based on an expected revenue . Thus, at this point of the decision process we 
have to compare with y” the actual yield for request r. 

In the algorithms formulated so far we perform each outpricing and path 
determination for a single request on the basis of the ’’true” dual values and 
opportunity costs. Thus the master problem has to be resolved after each ’’aug- 
mentation” to update these cost values. In time-critical environments like on-line 
booking such an exact optimization may be much too costly, and strategies have 
to be applied which reduce the number of dual price determination, eventually 
at the cost of not getting maximal contribution to profit. Thus, in algorithm S3 
we ’’freeze” the dual prizes as long as we can find profitable paths and we set 
the dual value/shadow price of arcs/legs which have become saturated to Big 
M, a sufficiently large value, which prevents constructing itineraries which use 
these arcs in the next steps. 

Note that applying algorithm S3 we will not always obtain the solution with 
maximal contribution to profit. 
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Algorithm S2 

let PF := ^ f{p) and let RF be the additional capacity of paths 

pGP’' 

p G with f{p) > 0 and let RZ be the additional capacity of paths 
p G P'’ in MP with /(p) = 0 

if PF + RF + RZ < j j not enough capacity available 

then 

apply algorithm SI 

else 

if p ^ < p ’’ and PF + RF > / / capacity is available and r is profitable 

then 

assign Q := {p G P^ with /(p) > 0} to r modifying p^(p) using 
introduce r into B and introduce an equality constraint for r 
accept r and resolve the booking model 
else //capacity available but r may be unprofitable 

reduce by /?’’ and solve the modified booking model, 
let z be the optimal value 

assign Q := {p G P^ with /(p) > 0} to r modifying p^(p) using 
introduce r into B and introduce an equality constraint for r 
solve the modified booking model, let z’ be the optimal value 
if z' > z accept request r 
else reject request r and restore the old MP 
end if 
end if 

Algorithm S3 

set the remaining demand rd := /?'’ 

while rd > 0 and exists r-feasible path p with positive reduced cost 

do path search for request r 

end while 

if rd > 0 then // resolve and try again 

solve the booking model with inequality constraint for 
while rd > 0 and exists r-feasible path p with positive reduced cost 
do path search for request r 
end while 
end if 
if rd = 0 
then 

accept request r 

introduce r into B and introduce an equality constraint for r 

else 

reject the request and restore the old MP 
end if 



This algorithm makes use of the following subroutine 
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Algorithm ’’path search for request r” 

compute r-feasible path p of maximal reduced cost 
let q be the capacity of p 
reduce capacity of arcs in p by minjrd, < 7 } 
reduce rd by min{r(i, q} 

set dual variables of arcs with zero capacity to Big M 
introduce p into MP 

A proper strategy to reduce the need for real-time computing is to accept book- 
ings which are uncritical with respect to capacity and apparently profitable with- 
out evaluation and optimization and to update the set of booked requests and 
construct a feasible freight flow using the booking model in a batch-processing 
kind of mode. In our fourth strategy (Algorithm S4) we give up the requirement 
to decide on the acceptance of requests sequentially one by one and we accumu- 
late requests to blocks and then decide on which requests are accepted in one 
single optimization run per block. 

Algorithm S4 



Phase 1 (collection) 

for all requests r = r{k, /3’’, j/’’) in the block do: 

introduce into MP an in-equality constraint for r with right hand side /?’’ 
end for 

Phase 2 (decision) 
repeat 

solve MP and update primal solution / and dual variables 
for all requests r = r{k, /3’’, y’’) in the block do: 

if exists r-feasible path p with positive reduced cost then 
compute r-feasible path p of maximal reduced cost 
introduce p into MP 
end if 
end for 

until no path was found in last repeat-loop run 
for all requests r = r{k, (3'" , y'') in the block do: 

if — Yl, /(p) = 0 // demand of request r is fulfllled 
peP’' 

then 

accept request r 

introduce r into B and introduce an equality constraint for r 
else 

reject request r 

remove the in-quality constraint and all paths found for r from MP 
end if 
end for 
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6 Computational Results 

The algorithms which we have described in section 4 have been implemented in 
Microsoft Visual C++ and have been applied to several real-world problems on a 
PC with Pentium III Processor with 600 MHz and 256 MB RAM under Windows 
98. The LP’s were solved using the ILOG-CPLEX 6.5 solver. For solving the 
constrained shortest path problems we have used a proprietary code (see Nickel 
[ 2000 ]). 

Our test problems were generated from 3 real world planning problems of a 
cargo airline representing specific markets (see: Zils [2000]): 

— Problem PIO with 10 airports, 624 legs and 1338 O&D’s 

— Problem P64 with 64 airports, 1592 legs and 3459 O&Ds 

— Problem P79 with 79 airports. 1223 legs and 1170 O&D’s’ 

In our experiment we have used the demand of the planning situation and we 
have generated a sequence of requests over time. First we have split the demand 
randomly into lumpy requests each having a demand between 20% and 100% of 
the total demand. Then we applied a random perturbation in the range of -10% 
to +10% to the demand of every request. With this procedure, we obtained a 
number of requests which was about twice the number of O&D’s (2533, 6521, 
and 2206, respectively). 




Fig. 1. Processing time of the different algorithms 



For all runs we used the solution of the planning problem obtained by Al- 
gorithm T1 as starting situation. For algorithm SI and S2 we also performed 
test runs based on planning solutions obtained by T2. For algorithm S4 we have 
performed tests with different blocksizes of 5, 10, 100 and 1000 requests. More- 
over we have generated a test run with a variable blocksize. Here, starting with 
a blocksize of 1% of the number of O&D’s the size was doubled after the pro- 
cessing of 10 blocks until after 40 blocks the blocksize was held constant. This 
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experiment should represent the situation that closer to the deadline the number 
of requests per time unit increases while the decision is made within constant 
time intervals and thus the number of request per block is increasing. Finally we 
performed a test (indicated by ”a”) where all requests were put into one single 
block. 

Figure 1 gives the result on the average processing time per call to the al- 
gorithms in ms. For all problem instances algorithm S3 and algorithm S4 with 
small blocksize have the smallest processing time which indicates that the means 
to reduce the effort materialize. Comparing algorithms SI and S2 which focus 
more on profitability, we can see that using the paths of the MP-solution pays 
off. 




Test 

Roblem 



Algorithm 



Fig. 2. Contribution to profit of different solutions 



In Figure 2 we compare the contribution to profit which is obtained, i.e. we 
state the optimal values as fractions of the planning solution. Note that by our 
modification of the demand we could have generated instances where the value of 
the solution in the dynamic revenue management scenario could outperform the 
solution for the planning problem. Therefore we have taken the value obtained 
when applying algorithm S4/a before rejecting those requests which cannot be 
served completely as reference value. This is indicated by ”ref”. The results 
show that the algorithms differ with respect to quality significantly. Algorithm 

52 which is comparable to SI with respect to computation time is outperforming 
SI and thus preferable. Algorithm S4 even when using small blocksizes is inferior 
to S3. Thus the assumption that one should postpone decisions is critical. Also, 
comparing algorithm S3 with algorithms SI and S2 we see that the quality of 

53 is only slightly inferior while on the other hand running time is significantly 
smaller. 
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Fig. 3. Number of subproblems to be solved per request 
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Fig. 4. Number of master problems to be solved 



To analyse the computational behaviour of the different algorithms we have 
counted the number of sub-problems i.e. shortest path problems which have to 
be solved. Figure 3 gives the average number per request. For each algorithm, the 
numbers do not differ very much for the three test problems. It is significant that 
algorithm S2 has to compute the smallest number of paths. Figure 4 displays 
results on the number of master problems which have to be solved, Here we see 
that algorithms S3 and S4 need to solve significantly less LP’s which accounts for 
the smaller processing time of S3 and S4. Further analysis shows that algorithm 
S3 (and S4) are more efficient than S2 only in those cases where the master-LP’s 
are large enough, i.e. the number of O&D’s is large enough. 

After all, algorithm S2 or algorithm S3 seem to be the choice among the 
algorithms which we have proposed and tested. Here we can say, that with 
respect to running time the relation between the effort to solve the master LP’s 
and to solve the path problems is crucial. The advantage of S2 with respect 
to quality becomes more significant if capacities become more scarce. There 
is no general best algorithm. Especially the running time for solving the LP’s 
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becomes unacceptable high for problem instances of larger size. Thus in these 
realistic and time-critical environments additional means to reduce the size of 
the master problems should be applied. Here one could reduce the number of 
paths in the search by limiting the allowable number of legs per path. Another 
option would be the decomposition of the network. Then the first step in the 
revenue management decision would be to assign every request to a suitable 
sub-network. 
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Abstract. Over the years, many techniques have been established to 
solve NP-Hard Optimization Problems and in particular multiobjective 
problems. Each of them are efficient on several types of problems or 
instances. We can distinguish exact methods dedicated to solve small in- 
stances, from heuristics - and particularly metaheuristics - that approx- 
imate best solutions on large instances. In this article, we firstly present 
an efficient exact method, called the two-phases method. We apply it to 
a biobjective Flow Shop Problem to find the optimal set of solutions. 
Exact methods are limited by the size of the instances, so we propose 
an original cooperation between this exact method and a Genetic Al- 
gorithm to obtain good results on large instances. Results obtained are 
promising and show that cooperation between antagonist optimization 
methods could be very efficient. 



1 Introduction 

A large part of real-world optimization problems are of multiobjective nature. 
In trying to solve Multiobjective Optimization Problems (MOPs), many meth- 
ods scalarize the objective vector into a single objective. Since several years, 
interest concerning MOPs area with Pareto approaches always grows. Many of 
these studies use Evolutionary Algorithms to solve MOPs [4,5,23] and only few 
approaches propose exact methods such as a classical branch and bound with 
Pareto approach, an e-constraint method and the two-phases method. 

In this paper, we propose to combine the two types of approaches: a meta- 
heuristic and an exact method. Therefore, we firstly present a two-phases method 
developed to exactly solve a BiObjective Flow Shop Problem (BOFSP) [11]. In 
order to optimize instances which are too large to be solved exactly, we propose 
and present cooperation methods between Genetic Algorithms (GAs) and the 
two-phases method. 

In section II, we define MOPs and we present a BOFSP. In section III, 
we present the two-phases method applied to the BOFSP, and computational 
results. In section IV, we present cooperation schemes between GA and the two- 
phases method. Section V presents results on non-solved instances. In the last 



C.C. Ribeiro and S.L. Martins (Eds.): WEA 2004, LNCS 3059, pp. 72—86, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




Cooperation between Branch and Bound and Evolutionary Approaches 



73 



section, we discuss on effectiveness of this approach and perspectives of this 
work. 

2 A Bi-objective Flow Shop Problem (BOFSP) 

2.1 Multiobjective Optimization Problems (MOPs) 

Although single-objective optimization may have a unique optimal solution, 
MOPs present a set of optimal solutions which are proposed to a decision maker. 
So, before presenting BOFSP, we have to describe and define MOPs in a gen- 
eral case. We assume that a solution x to such a problem can be described 
by a decision vector (xi, X 2 , ■■■, Xn) in the decision space X. A cost function 
f : X ^ Y evaluates the quality of each solution by assigning it an objective 
vector {ui,y 2 , ■■■,yp) in the objective space Y (see Fig. 1). So, multiobjective 
optimization consists in finding the solutions in the decision space optimizing 
(minimizing or maximizing) p objectives. 





Fig. 1. Example of MOP 



For the following definitions, we consider the minimization of p objectives. In 
the case of a single objective optimization, comparison between two solutions x 
and x' is immediate. For multiobjective optimization, comparing two solutions 
X and x' is more complex. Here, there exists only a partial order relation, known 
as the Pareto dominance concept: 

Definition 1. A solution x dominates a solution x' if and only if: 

(yk G [l..p],fk{x) < fk{x') 

(3/c G [l..p]/fk{x) < /fc(x') 

In MOPs, we are looking for Pareto Optimal solutions: 

Definition 2. A solution is Pareto optimal if it is not dominated by any other 
solution of the feasible set. 



The set of optimal solutions in the decision space X is denoted as the Pareto 
set, and its image in the objective space is the Pareto front. Here we are interested 
in a apriori approach where we want to find every Pareto solutions. 
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2.2 Flow Shop Problem (FSP) 

The FSP is one of the numerous scheduling problems. Flow-shop problem has 
been widely studied in the literature. Proposed methods for its resolution vary 
between exact methods, as the branch & bound algorithm [16], specific heuris- 
tics [15] and meta-heuristics [14]. However, the majority of works on flow-shop 
problem studies the problem in its single criterion form and aims mainly to 
minimize makespan, which is the total completion time. Several bi-objective 
approaches exist in the literature. Sayin et al. proposed a branch and bound 
strategy to solve the two-machine flow-shop scheduling problem, minimizing the 
makespan and the sum of completion times [16]. Sivrikaya-Serifoglu et al. pro- 
posed a comparison of branch & bound approaches for minimizing the makespan 
and a weighted combination of the average flowtime, applied to the two-machine 
flow-shop problem [17]. Rajendran proposed a specific heuristic to minimize the 
makespan and the total flowtime [15]. Nagar et al. proposed a survey of the 
existing multicriteria approaches of scheduling problems [14]. 

FSP can be presented as a set of N jobs J\, ■ ■ ■ , Jn to be scheduled on M 

machines. Machines are critical resources: one machine cannot be assigned to two 
jobs simultaneously. Each job Ji is composed of M consecutive tasks tu, . . . , tiM, 
where tij represents the task of the job Ji requiring the machine rrij. To each 
task tij is associated a processing time pij . Each job Ji must be achieved before 
its due date di. In our study, we are interested in permutation FSP where jobs 
must be scheduled in the same order on all the machines (Fig. 2). 
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Fig. 2. Example of permutation Flow-Shop 



In this work, we minimize two objectives: Cmax, the makespan (Total com- 
pletion time), and T, the total tardiness. Each task tij being scheduled at the 
time Sij, the two objectives can be computed as follows: 

Cmax = Max{SiM + PiM\i G [1 . • . iV]} 

N 

T = ^ [max{0, SiM + PiM ~ d^)] 

i=l 

In the Graham et. al. notation, this problem is denoted [7]: 
F /perm, di/ {Cmax ,T)- CMax minimization has been proven to be NP- 
hard for more than two machines, in [12]. The total tardiness objective T 
has been studied only a few times for M machines [9], but total tardiness 
minimization for one machine has been proven to be NP-hard [6]. The 
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evaluation of the performances of our algorithm has been realized on some 
Taillard benchmarks for the FSP [18], extended to the bi-objective case [20] 
(bi-objective benchmarks and results obtained are available on the web at 
http : //www . lif 1 . fr/~basseur). 



3 An Exact Approach to Solve BOFSP: The Two-Phases 
Method (TPM) 

On the Pareto front two types of solutions may be distinguished : the supported 
solutions, that may be found thanks to linear combinations of criteria, and non 
supported solutions [21]. As supported solutions are the vertices of the convex 
hull, they are nearer to the ideal optimal solution, and we can ask why it is 
important to look for non supported solutions. Figure 3, shows the importance 
of non-supported solutions. It represents the Pareto front for one instance of the 
bicriteria permutation flowshop with 20 jobs and 5 machines. This figure shows 
that for this example, only two Pareto solutions are supported (the extremes) 
and to get a good compromise between the two criteria, it is necessary to choose 
one of the non-supported solutions. 




Fig. 3. Importance of non supported solutions (Pb: ta_20_5_02) 



A lot of heuristic methods exist to solve multicriteria (and bicriteria) prob- 
lems. In this section we are interested in developing an exact method able to 
enumerate all the Pareto solutions for a bicriteria flowshop problem. 

A method, called the Two-Phases Method, has been proposed by Ulungu 
and Teghem to initially solve a bicriteria assignment problem [21]. This method 
is in fact a very general scheme that could be applied to other problems at 
certain conditions. It has not yet been very often used for scheduling applications 
where the most famous exact method for bicriteria scheduling problems is the 
e-constraint approach, proposed by Haimes et al. [8]. This section presents the 
application of the scheme of the two-phases method to the bicriteria flow shop 
problem under study. 
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3.1 The Two-Phases Method 

Here we present the general scheme of the method. It proceeds in two phases. The 
first phase finds all the supported solutions and the second all the non-supported 
ones. 




Fig. 4. Search direction. 





Fig. 6. Non supported so- 
lutions. 



— The first phase consists in finding supported solutions with aggregations of 

the two objectives C\ and C2 in the form AiCi -I- A2C2. It starts to find 
the two extreme efficient solutions that are two supported solutions. Then it 
looks recursively for the existence of supported solutions between two already 
found supported solutions z'’ and z® (we suppose and z^'^ > z^*^) 

according to a direction perpendicular to the line (z’' z®) (see figure 4), while 
defining Ai and A2 as follows: Ai = z^^^ — z^^\ X2 = z^®^ — z^\ Each new 
supported solution generates two new searches (see figure 5). 

— The second phase consists in finding non-supported solutions. Graphically, 
any non-supported solution between z’’ and z® belongs to the triangle repre- 
sented in figure 6. This triangle is defined by z’’, z® and Y, which is the point 
[z^^\z'^\ . Hence, the second phase consists in exploiting all the triangles, 
underlying each pair of adjacent supported solutions, in order to find the 
non-supported solutions. 



3.2 Applying the Two-Phases Method to a Bicriteria Flow Shop 
Problem 

The interesting point of the two-phases method, is that it solves exactly a bicri- 
teria problem without studying the whole search space. Hence we want to apply 
it to solve BOFSP for which the complete search space is too large to enable a 
complete enumeration. But this method is only a general scheme and applying it 
to a given problem requires a monocriterion exact method to solve aggregations. 

As this scheduling problem (even in its monocriterion form) is NP-Hard, we 
decided to develop a branch-and-bound method. A large part of the success of a 
branch-and-bound is based on the quality of its lower bounds. As the makespan 
minimization has been widely studied, we have adapted an existing bound for 
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this criterion whereas for the total tardiness we propose a new bound. Details 
about these bounds may be found in [11]. 

The search strategy used is “a depth first search” where at each step, the 
node with the best bound is chosen. Moreover, a large part of the tardiness (T) 
value is generated by the last scheduled jobs. So the construction of solutions 
places jobs either at the beginning or at the end of the schedule, in order to have 
a precise estimation of the final T value fastly. 



3.3 Improvements of the Two-Phases Method 



The two-phases method can be applied to any bicriteria problem. Applying it 
to scheduling problems allows improvements: 



— Search of the extremes: The calculation of the extremes may be very long for 
scheduling problems as there exists a lot of solutions with the same value for 
one criterion. Hence, we propose to find extremes in a lexicographic order. A 
criterion is first optimized and then the second, without degrading the first 
one. 

— Search intervals: The objective of the first phase is to find all the supported 

solutions in order to reduce the search space of the second phase. But when 
supported solutions are very near to each other, it is not interesting to look 
for all of them, as it will be very time consuming. Moreover, in the second 
phase, the search is, in fact, not reduced to the triangle shown on figure 6 
but to the whole rectangle {z 2 \y, 0). Hence, during the second phase, 

it is possible to find supported solutions that still exists. Then to avoid 
uninteresting branch-and-bounds we propose to execute a first phase only 
between solutions far from each other (a minimal distance is used). 



3.4 Results 



Table 1 presents results obtained with the two-phases method on the studied 
problems. The first column describes the instances of the problem: tamumber 
of jobsmumber of machines mumber of the instance. Then the three following 
columns indicate computational time with three different versions : the original 
two-phases method, the method with improvements proposed, and its parallel 
version^. It shows that both, improvements and parallelization allow to solve 
problems faster. Sequential runs have been executed on a l.OOGhz machine. The 
parallel version has been executed on eight 1.1 Ghz machines. 



^ The parallel version is described in [11] 
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Table 1. Results of two-phases method. 



Instances 


Time 


Original 

method 


With 

improvements 


With 

parallelization 


ta_20_5_01 


30” 


17” 


no need 


ta_20_5_02 


15’ 


14’ 


no need 


ta_20T0_01 


one week 


2 days 


1 day 


ta_20_10_02 


one week 


2 days 


1 day 


ta_20_20_01 


Unsolved 


Unsolved 


few weeks 



4 Using the Two-Phases Approach to Approximate the 
Optimal Pareto Front 

4.1 Motivations 

Exact methods are always limited by the size of the problem. Moreover, when the 
optimal Pareto front is not reached, these methods do not give good solutions. So, 
for these problems, heuristics are usually proposed. In this section, we propose 
to use the adaptation of the TPM to improve Pareto fronts obtained with a 
heuristic. Firstly, we briefly present the hybrid GA which will cooperate with 
TPM. Then we propose several cooperation mechanisms between TPM and the 
hybrid GA. 



4.2 An Adaptive Genetic/Memetic Algorithm (AGMA) 

In order to optimize solutions of FSP, AGMA algorithm has been proposed 
in [3]. AGMA is firstly a genetic algorithm (GA) which proposes an adaptive 
selection between mutation operators. Grossover, selection and diversification 
operators are described in [2]. Moreover, AGMA proposes an original hybrid 
approach: the search alternates adaptively between a Genetic Algorithm and a 
Memetic Algorithm (MA). The hybridization works as follows: Let Ppo* be the 
value of the modification rate done on the Pareto front PO* computed on the 
last generations of the GA. If this value goes below a threshold a, the MA is 
launched on the current GA population. When the MA is over, the Pareto front 
is updated, and the GA is re-run with the previous population (Algorithm 1). 

Gomputational results presented in [3] show that we have a good approxi- 
mation of the Pareto front. In order to improve these results, we propose some 
cooperative schemes between AGMA and TPM. 

4.3 Cooperation between AGMA and TPM 

Recently, interest for cooperation methods grows. A large part of them are hy- 
brid methods, in which a first heuristic gives solution (s) to a second one which 
upgrades its (their) quality [19]. But different Optimization Methods (OMs) can 
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Algorithm 1 AGMA algorithm 
Create an initial population 
while run time not reached do 

Make a GA generation with adaptive mutation 
Update PO* an Ppo* 
if P < a then 

/* Make a generation of MA on the population * / 

Apply crossover on randomly selected solutions of PO to create a set of new 
solutions. 

Compute the non-dominated set PO' on these solutions 
while New solutions found do 

Create the neighborhood A of each solution of PO' 

Let PO’ be the non-dominated set of A[J PO' 

end while 

Update PO* an Ppo» 

end if 

Update selection probability of each mutation operator 

end while 



cooperate in several ways as shown in figure 7. This cooperation can be sequen- 
tial (a), often called hybridization. The search can also alternate between two 
OMs (b). The alternativity may be decided thanks to thresholds (c). Finally a 
cooperation can be established, with one method integrated in a mechanism of 
the second one with or without threshold (d). 



automatic transition 
► transition by threshold 




(b) (d) 



Fig. 7. Examples of cooperation scheme 



Here we present three cooperation methods that combine the two-phases 
method (TPM) and the Adaptive Genetic/Memetic Algorithm (AGMA) pre- 
sented before. The first one is an exact method which uses the Pareto set ob- 
tained with AGMA to speed up TPM. But the computational time of TPM still 
grows exponentially with the size of the instances. So, for the next approaches 
running on larger problems, we add constraints to TPM, to guide the algorithm 
despite of the loss of the guaranty to obtain the optimal Pareto set. 
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These three methods use the cooperation scheme (a). But we can apply these 
methods with the other cooperation schemes, which are more evolved. 

Approach 1 - An improved exact approach: Using AGMA solutions 
as initial values: In this approach, we run the whole two-phases method. For 
every branch-and-bounds of the TPM, we consider the best solutions given by 
the meta-heuristic as initial values. Therefore we can cut a lot of nodes of the 
branch-and-bound and find all optimal solutions with this method. 

The time required to solve a given problem is of course smaller if the distance 
between the initial front (given by the meta-heuristic) and the optimal front is 
small. If the distance between them is null, the TPM will be used to prove that 
solutions produced by AGMA are optimal. 

Even if this approach reduces the time needed to find the exact Pareto front, 
it does not allow to increase a lot the size of the problems solved. 

Approach 2 - Using TPM as a Very Large Neighborhood Search 
(VLNS): Neighborhood search algorithms (also called local search algorithms) 
are a wide class of improvement heuristics where at each iteration an improv- 
ing solution is found by exploring the “neighborhood” of the current solution. 
Ahuja et. al remark that a critical issue in the design of a neighborhood search 
is the choice of the neighborhood structure [1]. In a general case, larger is the 
neighborhood, more efficient is the neighborhood search. So, VLNS algorithms 
consist in exploring exponential neighborhood in a practical time to get better 
results. In [I], several exponential neighborhoods techniques are exposed. Here, 
we propose to use TPM as a VLNS algorithm. 

The idea is to reduce the space explored by the TPM by cutting branches 
when the solution in construction is too far from the initial Pareto solution. 
An efficient neighborhood operator for FSP is the insertion operator [3]. So, we 
allow TPM to explore only the neighborhood of an initial Pareto solution which 
consists of solutions that are distant from less than 6max insertion operator 
applications from it: 

The following example represents an example of solution construction using 
VLNS approach from the initial solution abode f ghij . In this example two sets 
of jobs are used: The first one (initialized to {}) represents the current partially 
constructed solution and the second one (initialized to {abode f ghij}) represents 
jobs that have to be placed. During the solution construction, S value (initially 
set to 0) is incremented for each breaking order with the initial solution. If 
d = dmax, then no more breaking order is allowed, so in this case, only one 
schedule is explored: 

Example (constraint: Smax = 2): 

— Initialization: {},{abodefghij} (represents: {jobs scheduled}, (jobs to be 
placed}) 

— We firstly place the two first jobs: {ab}, {cdefghij}. (5 = 0. 

— Then we place job g, so we apply insertion operator on the remaining jobs: 
{abg}, {odefhij}. For the moment, the distance S between the initial solution 
and the current solution is 1 (one breaking order). 
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~ Then we place jobs c and d: {abgcd}, {efhij}. <5=1. 

— Then we place job h: {abgcdh}, {efij}. 6 = 2. 

— Here, 6 = Smax, so the last jobs have to be scheduled without breaking order. 
So, the single solution explored is the schedule {abgcdhefij}, {}, with <5 = 2. 
Others possible schedules are too far from the initial solution. 

The size of the insertion neighborhood is in 0{N'^), so the size of the space 
explored by TPM may be approximated (for dmax « N) by Hence, 

we have to limit dmax value, especially on instances with many jobs. 



Approach 3 - A local optimization with TPM: This third cooperation 
limits the size of the explored space, while reducing it to a partition of Pareto 
solutions proposed by AGMA. So TPM is applied on regions of Pareto solutions. 

The main goal of this approach is to limit the size of the trees obtained by 
the TPM, in order to apply this approach to large instances. In this section, we 
will present a non-exact two-phases method in which we only explore a region of 
the decisional space. So, we select partitions of each solution obtained by AGMA 
algorithm. Then we explore all the solutions obtained with modifications of these 
partitions using the two-phases method. After having explored a partition for 
all the Pareto set, we extract the new Pareto set from the obtained solutions. 

This Simple Partitionning Post Optimization Branch & Bound (SPPOBB) 
works as follows: 

The two-phases method explores the tree by placing jobs either at the be- 
ginning or at the end of the schedule. So, if the partition is defined from job Xi 
to job Xj, it places, jobs O..Xi — 1 at the beginning of the schedule and jobs 
Xj + 1..N at the end. Then it explores the remaining solutions of the tree by 
using the two-phases method technique. 

Figure 8 shows an example of hybridization by the two-phases method - it 
can be applied for other branch and bound methods. In this figure, we consider 
an initial solution a, 6, ... , i,j, which is on the Pareto front obtained by AGMA 
algorithm. In this example, N = 10, the partition size is 4, and is applied from 
job number 4 to 7, i.e jobs d,e,f,g in the schedule. The first phase consists in 
placing the first three jobs at the beginning of the schedule. Then, it places the 
last three jobs at the end of the schedule (a job j placed in queue is symbolized 
by —j). Then, we apply the two-phases method on the remaining jobs. After 
cutting several nodes, 5 complete schedules have been explored: 

— a,b,c,-j,-i,-h,d,-e,f,g which corresponds to the schedule abcdfgehij 

— a,b,c,-j,-i,-h,d,-g,e,f which corresponds to the schedule abcdefghij (the initial 

solution) 

— a,b,c,-j,-i,-h,d,-g,f,e which corresponds to the schedule abcdfeghij 

— a,b,c,-j,-i,-h,g,-d,-e,f which corresponds to the schedule abcgfedhij 

— a,b,c,-j,-i,-h,g,-d,-f,e which corresponds to the schedule abcgefdhij 

Parameters: 

Different parameters have to be set to have an efficient search without having 
a too large time expense. 
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XiJ Xj| 

a|b|c|d|e|f|g h | i | j I 

Initial solution 



0 



f )g^ e f -e -f 



Fig. 8. Example: one partition exploration 



— Size of the partitions: The cardinality of the Pareto set obtained with AGMA 
algorithm varies between several tens and two hundred solutions. In order 
to obtain solutions rapidly, we limit the size of partitions to 15 jobs for the 
10-machines instances, and 12 jobs for the 20-machines instances. So each 
two-phases execution can be solved in several seconds or minutes. 

— Number of partitions for each solution: Enough partitions of the complete 
schedule have to be considered to treat each job at least once by TPM 
approach. Moreover, it is interesting to superpose consecutive partitions to 
authorize several moves of a same job during optimization. Then, a job which 
is early scheduled could be translated at the end of the schedule by successive 
moves. On the other side, the more partitions we have, the more processing 
time is needed. So we take 8 partitions for the 50-jobs instances, 16 partitions 
for the 100-jobs instances and 32 partitions for the 200-jobs instances. 

5 Results 

We test the first approach to prove optimality of Pareto fronts on small instances. 
This approach reduces the time needed by the TPM to exactly solve these in- 
stances. Then we test the last two approaches. Results are comparable on 50 
machines instances, but the computational time of the VLNS approach is expo- 
nential, so we present here only the results obtained with SPPOBB. However, 
the other approaches give some perspectives about cooperation mechanisms. 

In this part, we firstly present performance indicators to evaluate effectiveness 
of this approach. Then we apply these indicators to compare the fronts obtained 
before and after cooperation with SPPOBB. 

5.1 Quality Assessment of Pareto Set Approximation 

Solutions’ quality can be assessed in different ways. We can observe graphically 
progress realized as in figures 9 and 10. Here, we use the contribution metric 
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[13] to evaluate the proportion of Pareto solutions given by each front, and the 
S metric, as suggested in [10], to evaluate the dominated area. 





AGMA Paielo set + 
SPPOBB Paielo set 0 


- 

■ 






50 I 1 1 1 1 1 1 1 1 1 p 

38000 40000 42000 44000 46000 48000 50000 52000 54000 56000 58000 

Tardiness 



Fig. 9. SPPOBB results: instance with Fig. 10. SPPOBB results: instance with 
100 jobs and 10 machines. 200 jobs and 10 machines. 



Contribution metric: The contribution of a set of solutions PO\ relatively to 
a set of solutions PO 2 is the ratio of non-dominated solutions produced by PO\ 
in PO* , where PO* is the set of Pareto solutions of PO\ U P02- 



— Let PO be the set of solutions in POi fl PO 2 . 

— Let Wi (resp. W 2 ) be the set of solutions in PO\ (resp. PO 2 ) that dominate 
some solutions of PO 2 (resp. POi). 

— Let Li (resp. L 2 ) be the set of solutions in PO\ (resp. PO 2 ) that are domi- 
nated by some solutions of PO 2 (resp. POi). 

— Let (resp. N 2 ) be the other solutions of POi (resp. PO 2 ): Ni = POi \ 
{POUWiULi). 



Cont{POi / PO 2 ) 



^ + ||lPi|| + |]iVi|| 
iipoi 



S metric: A definition of the S metric is given in [22]. Let PO be a non- 
dominated set of solutions. S metric calculates the hyper-volume of the multi- 
dimensional region enclosed by PO and a reference point Zref ■ 

Let POi and PO 2 be two sets of solutions. To evaluate quality of POi against 
PO 2 , we compute the ratio (S'(POi) — S{P02))/ S{P02)- For the evaluation, 
the reference point is the one with the worst value on each objective among all 
the Pareto solutions found over the runs. 



5.2 Computational Results 

We use S and Contribution metrics to compute improvements realized on fronts. 
Tests were realized for 10 runs per instance, on a 1.6Ghz machine. Tables 2 and 
3 show the results obtained for these metrics. 
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Table 2. Quality assessment (contribution metric): C(SPPOBB/AGMA) 



Problem 


ta_50_10. 


_01 ta_50_20. 


_01 ta_100_10. 


_01 ta_100_20. 


_01 ta_200_10_01 


C Min 


0.54 


0.51 


0.96 


0.73 


1.00 


C Max 


0.63 


0.55 


1.00 


0.96 


1.00 


Average 


0.594 


0.525 


0.986 


0.876 


1.000 


Std dev 


0.026 


0.015 


0.015 


0.062 


0.000 



Table 3. Quality assessment (S metric): S(SPPOBB)/S(AGMA) 



Problem 


ta_50_10_ 


01 ta_50_20_01 ta. 


_100_10_01 ta. 


O 

O 

O 


_01 ta_200_10_01 


SmIu 


0.02% 


0.01% 


0.75% 


0.28% 


8.35% 


SMax 


0.46% 


0.27% 


2.10% 


1.92% 


15.57% 


Average 


0.185% 


0.093% 


1.199% 


0.970% 


13.094% 


Std dev 


0.122% 


0.095% 


0.387% 


0.412% 


1.974% 



Table 2 shows that improvements realized on 50*10 and 50*20 instances were 
small in a general case. In fact we have an average improvement of 18.8 per cent 
of the initial Pareto set for the 50*10 instance, and 4.8 per cent for the 50*20 
instance. For the other problems, a large part of the new Pareto set dominates 
the initial set of Pareto solutions. Table 3 shows a good progression of the Pareto 
front for large problems, especially for the 200 jobs* 10 machines instance. 

Table 4 shows that the time required to realize the set of two phases is almost 
regular despite of the branch & bound approach. 



Table 4. Run time 



Problem 


ta_50_10. 


_01 ta_50_20. 


_01 ta_100_10. 


_01 ta_100_20. 


_01 ta_200_10_01 


TmIu 


3hl6’ 


5h33’ 


28hll’ 


37h26’ 


63h24’ 


TMax 


5h26’ 


6hl9’ 


52h44’ 


65h02’ 


122h45’ 


Average 


4hl9’ 


5h49’ 


35h54’ 


50h25’ 


90h53’ 


Std dev 


42’ 


25’ 


8h05’ 


7h58’ 


17hl6’ 



6 Conclusion and Perspectives 

In this paper, we have first presented an exact approach and a metaheuristic 
approach to solve MOPs. These approaches have been applied on a BOFSP. 
Then we have proposed original approaches to upgrade metaheuristic results by 
using an exact method i.e. the two-phases method. These approaches were tested. 
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and their effectivenesses were shown by improvements realized on Pareto fronts 
obtained with AGMA algorithm. These results show the interest of this type of 
methods, which can be improved by adding other mechanisms to explore a large 
region of the search space without exploring a great part of the solutions. In the 
future, cooperation could be made in a hybrid way to combine the partitionning 
and the VLNS approaches. Another way for cooperation between evolutionary 
and exact approaches, without considering partitions of optimal solution, is to 
extract information from these solutions to reduce sufficiently the size of the 
search space. 
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Abstract. The simple max-cut problem is as follows: given a graph, 
hnd a partition of its vertex set into two disjoint sets, such that the 
number of edges having one endpoint in each set is as large as possible. 
A split graph is a graph whose vertex set admits a partition into a stable 
set and a clique. The simple max-cut decision problem is known to be 
NP-complete for split graphs. An indifference graph is the intersection 
graph of a set of unit intervals of the real line. We show that the simple 
MAX-CUT problem can be solved in linear time for a graph that is both 
split and indifference. Moreover, we also show that for each constant 
q, the SIMPLE MAX-CUT problem can be solved in polynomial time for 
{q, q — 4)-graphs. These are graphs for which no set of at most q vertices 
induces more than q — 4 distinct Pi’s. 



1 Introduction 

The MAXIMUM CUT problem (or the maximum bipartite subgraph problem) 
asks for a bipartition of the graph (with edge weights) with a total weight as large 
as possible. In this paper we consider only the simple case, i.e., all edges in the 
graph have weight one. Then the objective of this simple max-cut problem is 
to delete a minimum number of edges such that the resulting graph is bipartite. 
Making a graph bipartite with few edge deletions has many applications [26] . A 
very recent one is found in the emerging field of SNP (single nucleotide poly- 
morphism) analysis in computational molecular biology, e.g., see [11,27]. Aiming 
for efficient algorithms, we only consider the unweighted case since the classes of 
graphs we consider in this paper contain all complete graphs and the (weighted) 
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MAXIMUM CUT problem is NP-complete for every class of graphs containing all 
complete graphs [21,26]. 

As SIMPLE MAX-CUT is NP-complete in general, there are basically two lines 
of research to cope with its computational hardness. First, one may study 
polynomial-time approximation algorithms (it is known to be approximable 
within 1.1383, see [13]) or try to develop exact (exponential-time) algorithms 
(see [15] for an algorithm running in time 2™/^ • where m is the number 

of edges in the graph). Approximation and exact algorithms both have their 
drawbacks, i.e., non-optimality of the gained solution or poor running time even 
for relatively small problem instance sizes. Hence, the second line of research — as 
pursued in this paper — is to determine and analyze special graph structures that 
make it possible to solve the problem efficiently and optimally. This leads to the 
study of special graph classes. (Have a look at the classics [14,9] for general infor- 
mation on numerous graph classes.) For example, it was shown that the simple 
MAX-CUT problem remains NP-complete for cobipartite graphs, split graphs, and 
graphs with chromatic number three [6]. On the positive side, the problem can 
be efficiently solved for cographs [6], linegraphs [1], planar graphs [24,16], and 
for graphs with bounded treewidth [29]. 

In this paper we consider two classes of graphs, both of which possess nice 
decomposition properties which we make use of in the algorithms for simple 
MAX-CUT to be described. Also, both graph classes we study are related to 
split graphs. An indifference graph is the intersection graph of a set of unit 
intervals of the real line. (See [23] for more information on intersection graphs 
and their applications in biology and other fields.) A split graph is a graph whose 
vertex set admits a partition into a stable set and a clique. Ortiz, Maculan, 
and Szwarcfiter [25] characterized graphs that are both split and indifference 
in terms of their maximal cliques, and used this characterization to edge-colour 
those graphs in polynomial time. First, we show that this characterization also 
leads to a linear-time solution for the simple max-cut problem for graphs that 
are both split and indifference. 

Second, we study the class of {q, q — 4)-graphs (also known as graphs with 
few P 4 S [4] and introduced in [2]). These are graphs for which no set of at 
most q vertices induces more than q — 4 distinct P 4 S. (A P 4 is a path with four 
vertices.) In this terminology, the cographs are exactly the (4, 0)-graphs. The 
class of (5, l)-graphs are called P 4 -sparse graphs. Jamison and Olariu [20] showed 
that {q,q — 4)-graphs allow a nice decomposition tree similar to cographs [20]. 
This decomposition can be used to find fast solutions for several in general NP- 
complete problems (see, e.g., [3,22]). Also using this decomposition, we show that 
the SIMPLE MAX-CUT problem can be solved in polynomial time for (g, q — 4)- 
graphs for every constant q. 



2 Preliminaries 

In this paper, G denotes a simple, undirected, finite, connected graph, and V (G) 
and E{G) are respectively the vertex and edge sets of G. The vertex-set size is 
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denoted by |V^(G)| = N, and Kn denotes the complete graph on N vertices. A 
stable set (or independent set) is a set of vertices pairwise non-adjacent in G. 
A clique is a set of vertices pairwise adjacent in G. A maximal clique of G is a 
clique not properly contained in any other clique. A subgraph of G is a graph 
H with V{H) C V{G) and E{H) C E{G). For X C V{G), we denote by G[X] 
the subgraph induced by X, that is, V{G[X\) = X and E{G[X\) consists of those 
edges of E{G) having both ends in X. 

Given nonempty subsets X and Y of V{G), the symbol {X,Y) denotes the 
subset {xy G E{G) ■. x ^ X,y &Y} oi E{G). A cut /C of a graph G is the set of 
edges (S', F(G) \ S), defined by a subset S C V{G). We often write S instead of 
V (G) \ S. We also write 6{S) for the set of edges with exactly one endpoint in S 
(and the other endpoint in IG(G')\S) . By |/C| we denote the number of edges in the 
cut /C and £{IC) is the number of edges in E{G) \/C, i.e., the number of edges that 
are lost by the cut /C. A max-cut /C is a cut such that |/C| is as large as possible. 
The (simple) max-cut problem considers the computation of two complementary 
parameters of a graph G: mc(G) = max{|/C| : /C is a cut of G} = maxg^y l'^(>5')|, 
the maximum number of edges in a cut of G; and ^{G) = |A(G)| — mc{G), 
the minimum number of edges lost by a cut of G (making the remaining graph 
bipartite). Instead of calculating mc(G) directly it is sometimes more convenient 
to calculate first, for t = 1, . . . , n, the values mc(G, i) = maxs^y |S|=j l<5('S')|- 
In the sequel, the following observations will be helpful. 

Remark 1. For K^, the complete graph on N vertices, we have: 

— If{S,S) is a max-cut of Km, then lAj = [yj; 

-mc(iG^)=LfJ-rfl- 

We say that a max-cut in a complete graph is a balanced cut. 

Remark 2. Let H be a subgraph of a graph G and let 1C be a cut of G. If 
i{IC) = i{E[), then 1C is a max-cut of G. 

Proof Since H is a, subgraph of G, any cut Af of G satisfies > £{H) = £{IC). 
Hence /C is a cut of minimum loss in G, in other words, /C is a max-cut of G. ■ 

Remark 3. Let |H(G)| = N and let S be a subset ofV{G) satisfying: 

— every vertex of S is adjacent to every vertex of S. 

Then (S,S) is a max-cut ofG. 

Proof. Clearly the cut {S, S) has edges, the maximum possible size of 

a cut in G. ■ 

The union of two graphs Gi and G 2 , denoted by Gi U G 2 , is the graph such 
that V^(Gi U G 2 ) = V{G\) U V{G 2 ) and E{G\ U G 2 ) = E{G\) U E{G 2 ). By way 
of contrast, Gi \ G 2 denotes the subgraph of Gi induced by V (Gi) \ V (G 2 ) . The 
(disjoint) sum of two graphs Gi and G 2 makes every vertex of Gi adjacent to 
every vertex of G 2 . 
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3 Linear-Time Solution for Split-Indifference Graphs 

Some Preliminaries 

An interval graph is the intersection graph of a set of intervals of the real line 
(cf. [9,23] for general expositions). In case of unit intervals the graph is called 
unit interval, proper interval, or indifference graph. We shall adopt the latter 
name, to be consistent with the terminology of indifference orders, defined next. 
(For a recent proof that the class of unit interval graphs coincides with that of 
the proper interval graphs, see [8].) Indifference graphs can be characterized as 
those interval graphs without an induced claw, (i.e., a Ki ff). Indifference graphs 
can also be characterized by a linear order: their vertices can be linearly ordered 
so that the vertices contained in the same maximal cliques are consecutive [28]. 
We call such an order an indifference order. 

A split graph is a graph whose vertex set can be partitioned into a stable set 
and a clique. A split-indifference graph is a graph that is both split and indiffer- 
ence. We shall use the following characterization of split-indifference graphs in 
terms of their maximal cliques due to [25]. 

Theorem 1. Let G be a connected graph. Then G is a split-indifference graph 
if and only if 

— G = Kn, or 

— G = Km U Kn, where n> m > 1, and Km \ = K\, or 

— G = Km U Kn U Ki, where n>m>l, n>l>l, and Km \ K„ = Ki, 
Ki\Kn = Ki. Moreover, V{Km) n V{Ki) = 0 or V{Km) U V{Ki) = V{G). 

This characterization was applied to obtain a polynomial-time algorithm to 
edge colour split-indifference graphs [25] . In the sequel, we show how to apply this 
characterization to obtain a linear-time algorithm to solve the max-cut problem 
for split-indifference graphs. 



The Balanced Cut Is not Always Maximal 

A natural approach [7] for solving max-cut for indifference graphs is the fol- 
lowing. Let V\,V 2 , ■ . ■ ,vt be an indifference order for G and define /C = {S, S) 
as follows: Place in S all vertices with odd labels and place in S the remain- 
ing vertices (i.e., those with even labels). By definition of 1C and by Remark 1, 
K. n E(A4) is a max-cut of A4, for every graph A4 induced by a maximal clique 
of G. This natural approach defines a cut that is locally balanced, i.e., it gives 
a cut that is a max-cut with respect to each maximal clique. The following ex- 
ample shows that JC is not necessarily a max-cut of G. Consider the indifference 
graph G with five (ordered) vertices Vi,V 2 ,V 3 ,V 4 ,v^, where {vi,V 2 ,V 3 ,V 4 } induce 
a K 4 , and {w3,W4,W5} induce a A3. Note that the cut ({ui, W3, W5}, {f2, W4}) has 
5 edges, whereas the cut ({wi, f2, ^^5}, {i’3, ^^4}) has 6 edges. Therefore, this ap- 
proach works only when the indifference graph G has only one maximal clique, 
i.e., when G is a complete graph which covers the first point in Theorem 1. 
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Let G = Kn U Km, where \V{Kn) H V{Km)\ = i- Call Ki the graph induced 
by the vertices of the intersection. We say that a cut /C of G is compatible if: 

a) /C n E{Kn) is a max-cut of and 1C fl E{Km) is a max-cut of Km', 

b) Among all cuts /C of G satisfying condition a), |/C fl E{Ki)\ is minimal. 

Clearly, the cut proposed by the natural approach satisfies condition a) but 
not necessarily condition b) of the definition of compatible cut. Clearly, for the 
example above the compatible cut gives the maximum cut. However, our subse- 
quent study of the max-cut problem for graphs with two maximal cliques shows 
that it is not always possible to define a max-cut which is a compatible cut for 
the graph. We actually show that there are graphs for which the max-cut is not 
balanced with respect to any maximal clique of the graph. 

In the sequel, we show how to use this approach — considering cuts /C such 
that locally /C fl E{M) is a max-cut of M, for every graph M induced by a 
maximal clique — to find first a max-cut in a graph with two maximal cliques 
(which covers the second point in Theorem 1) and then to find a max-cut in a 
split-indifference graph (by dealing with the third point in Theorem 1). 

Graphs with Two Maximal Cliques 

In this section we consider general graphs with precisely two maximal cliques. 
Note that a graph with precisely two maximal cliques is necessarily an indiffer- 
ence graph but not necessarily a split graph. 

Lemma 1. Let G = KnUK^. with n > m > i > 1, where \ V{Kn)r\V{Km) \ = i- 
Call Ki the graph induced by the vertices of the intersection. Let (S', S) be a cut 
of G. Let X = |S n y {Ki)\ . Suppose x < [|J . Then, the maximum value of a cut 
(S, S) having x vertices in S C\ V{Ki) is obtained by placing the vertices outside 
the intersection Ki as follows: 

— Place in S the largest possible number that is less than or equal to |"|^] — x 
of vertices of K„ \ Ki,- 

— Place in S the largest possible number that is less than or equal to — a; 
of vertices of Km \ Ki . 

Proof. Let Af = (S, S) be a cut of G. Since G contains two maximal cliques, i.e., 
G = Kn U Km, with \V{Kn) n V{Km) \ = i, we may count the number of edges 
in the cut Af as follows: 

|AT| = \AfnE{Kn)\ + \AfnE{Km)\ - \AfnE{K,)\. 

Now because cc = |S fl V{Ki)\, we have \Af fl E{Ki)\ = x{i — x). Hence, by 
placing the vertices outside the intersection Ki as described, we get a cut as 
close as possible to the balanced cut with respect to both Kn and Km- ■ 

By using the notation of Lemma 1, let M{x) be the number of edges of a 
maximum cut of G having x vertices of Ki in S. By Lemma 1, M{x) is well de- 
fined as a function of x in the interval [0, [|J]. We consider three cases according 
to the relation between i and |"^], and i and [§]. In each case, our goal is to 
find the values of x in the interval [0, [|J] which maximize M{x). 
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Case 1: i < In this case, x < |"^] and i—x < . Hence, vertices 

outside the intersection can be placed accordingly to get balanced partitions for 
both Kn and Then M{x) is equal to Mi(x), which is defined as follows: 
Mi{x) = |"fl [f J + TtI LtJ ~ ~ ^)- want to maximize Mi{x) over the 

interval [0, [|J]. In this case, we have just one maximum, which occurs at x = 0. 

Case 2: |"^] < i < |"^] In this case, we still have x < |"™] , but not necessarily 
i — X < |"^].Ifz — x< [y], then the function M(x) is equal to Mi(x) above. 
Otherwise, i — x > \^~\, and it is not possible to get a balanced partition with 
respect to Km- By Lemma 1, the maximum cut in this case is obtained by placing 
all vertices of Km \ Ki in S. Therefore, the function M{x) is 

- I ^ 2 ( 2 ;) = rtlLfJ + (m-i)(i-x) for 0 < X < i - [f] 

\ Ml (x) = [f 1 Lt J + [f 1 Lf J - - x) for i - [f 1 < X < LiJ 

It is easy to see that M(x) is a function that is continuous and decreasing 
with maximum at x = 0. 

Case 3: |"^] < |"^] < i In this case, we distinguish three intervals for i — x 
to be in: 

If z— X < 1"^] , then vertices outside the intersection can be placed accordingly 
to get balanced partitions for both K„ and Km, and M(x) = Mi(x). 

If 1"^] < z — X < 1"^], then only K„ gets a balanced partition and M(x) = 
M2 (x). 

Finally, if z — x > |"^] , then a maximum cut is obtained by placing all vertices 
outside the intersection in S and we get a new function M^{x). 

Therefore, a complete description of the function M(x) is 

( M 3 {x) = {i — x){n + m — 2i + x) for 0 < x < z — 

M 2 {x) = [|J + (m — z) (z — x) for z — |"|] < X < i — 1"^] 

M^{x) = \^M^\ + \fMf\-x{i-x) forz-Tfl <x< LiJ 

Observe that this function is also continuous but not always decreasing. The 
function M^{x) is a parabola with apex at x = where N = m + n — i 

is the total number of vertices of G. For this reason, we distinguish two cases, 
according to the relation between i and N, as follows: M(x) has maximum at 
X = 0 when z < [=yj, and M(x) has maximum at x = when z > ["yj. 

Since x takes values on the interval [0, [|J], we have two possible values for x in 
this case: the maximum cut has either z — [ yj or z — [ y] vertices of Ki in S. 
In summary, we have shown: 

Theorem 2. Let G = iF„UiFm with n > m > i > 1, where \V{Kn)r\V{Km)\ = 
i. Call Ki the graph induced by the vertices of the intersection. Let Vi,V 2 , ■ ■ ■ ,vn 
be an indifference order of G such that vertices V\,V 2 , . . . induce a Kn, ver- 
tices Vn-i, Vn-i+i, ■ • ■ , inducc a Ki containing the vertices of the intersection, 
and Vn-i,Vn-i+i, ■ ■ ■ ,vn induce a Km- A maximum cut of G is obtained as 
follows: 
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~ If i ^ TtI — r§l; then the compatible cut (S,S) that places in S the first 
1"^] vertices, and the last vertices, contains zero edges of Ki, and is a 
maximum cut ofG. 

~ If \ff~\ < * < [fl; then the cut (S,S) that places in S the first |"|] vertices, 
and the last m — i vertices, contains zero edges of Ki, is not a compatible 
cut, and is a maximum cut ofG. 

~ If I — rtl < then we distinguish two cases. If i < then the cut 
(S, S) that places in S the first n — i vertices, and the last m — i vertices, 
contains zero edges of Ki, is not a compatible cut, and is a maximum cut 
of G. If i > ["f-J, then the cut {S,S) that places in S vertices if the 
intersection is not a compatible cut, and is a maximum cut ofG. 



Split-Indifference Graphs with Three Maximal Cliques 

In this section we consider split-indifference graphs with precisely three maximal 
cliques. By Theorem 1, any such graph G = Km U K„ U Ki, with n > m, n > I, 
satisfies Km\Kn = {!}, Ki\Kn = {t}, i.e., the vertex set V{G) = y(iG„)U{l, t}. 
In other words, we have |1^(G)| = N = n + 2. In addition, there exists an 
indifference order for G having vertex I first, vertex t last, and the remaining 
vertices between 1 and t. 

To obtain a solution for the max-cut problem for a split-indifference graph 
with precisely three maximal cliques, we shall consider three cases. 



Case 1: vertex 1 is adjacent to at most vertices or vertex t is 

adjacent to at most vertices In the preceding subsection we studied 
the case of two maximal cliques. In particular, we got the easy case that if a 
graph H = U Km, with n > m and such that Km \ K„ = {!}, then there 
exists a max-cut of H that places on the same side the vertices that are 
closer to vertex 1 with respect to the indifference order of H , and places vertex 
1 and the remaining vertices on the opposite side. 

Now suppose vertex t is adjacent to at most vertices. Let {S,V{H)\S) be 
a max-cut of H that places all neighbours of t on the same side S. By Remark 2, 
{S,V{H)\SU{t}) is a max-cut of the entire graph G, because {S,V{H)\SU{t}) 
looses the same number of edges as the cut {S,V{H) \ S). 



Case 2: both vertices 1 and t are adjacent to at least vertices but 
there are not vertices adjacent to both 1 and t Note that every 

vertex of RT„ is adjacent to 1 or to t. Let S contain vertex 1 and a set of \^~\ 
neighbours of t that includes all nonneighbours of 1 . The only “missing” edge in 
the cut (S,S) is the edge It, an edge not present in G. Since there are not 
vertices adjacent to both 1 and t, it is not possible to define a cut for G larger 
than (S', S) by placing vertices 1 and t on the same side. 
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Case 3: there exist vertices adjacent to both 1 and t Let S' be a 

set of vertices adjacent to both 1 and t. Remark 3 justifies {S,S) to be a 
max-cut of G. 

Theorem 3. Let G he a split-indifference graph with three maximal cliques Km, 
Kn, and Ki, with n > m, n > I, and satisfying Km \ Kn = {!}, Ki \ Kn = {t}. 
Let vi,V 2 , ■ ■ ■ ,vn be an indifference order of G having vertex 1 first, vertex t 
last. A maximum cut of G is obtained as follows: 

— If vertex t is adjacent to at most vertices, then the cut (S,S) that places 
in S vertex 1 and the [|J vertices that are closer to t with respect to the 
indifference order is a maximum cut of G. An analogous result follows if 
vertex 1 is adjacent to at most vertices. 

— If both vertices 1 and t are adjacent to at least |"|] vertices hut there are 
not vertices adjacent to both 1 and t, then the cut {S, S) that places 
in S vertex 1 and the \^~\ vertices that are closer to t with respect to the 
indifference order is a maximum cut of G. 

— If there exist vertices adjacent to both 1 and t, then the cut {S, S) that 
places in S a set of ["yj vertices adjacent to both 1 and t is a maximum cut 
ofG. 

Altogether, we obtain the following main result. 

Corollary 1. Simple max-cut can he solved in linear time for split- 
indifference graphs. 

Proof. The result directly follows from combining Theorem 1 with Remark 1, 
Theorem 2, and Theorem 3. ■ 



4 Polynomial-Time Solution for {q, q — 4)-Graphs 

Some preliminaries. A graph is a {q,t) -graph if no set of at most q vertices 
induces more than t distinct P 4 S. The class of cographs are exactly the (4,0)- 
graphs, i.e., cographs are graphs without induced P 4 . The class of so-called 
Rj-sparse graphs coincides with the (5, l)-graphs. The class of Rj-sparse graphs 
was extensively studied in [17,18,19,12]. 

It was shown in [3] that many problems can be solved efficiently for (q, q — A)- 
graphs for each constant q. These results make use of a decomposition theorem 
which we state below. In this section we show that this decomposition can also be 
used to solve the simple max-cut problem. In order to state the decomposition 
theorem for (q, q — 4)-graphs we need some preliminaries. 

Recall that a split graph is a graph of which the vertex set can be split into 
two sets K and / such that K induces a clique and I induces an independent 
set in G. A spider is a split graph consisting of a clique and an independent 
set of equal size (at least two) such that each vertex of the independent set has 
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precisely one neighbor in the clique and each vertex of the clique has precisely 
one neighbor in the independent set, or it is the complement of such a graph. 
We call a spider thin if every vertex of the independent set has precisely one 
neighbor in the clique. A spider is thick if every vertex of the independent set is 
non-adjacent to precisely one vertex of the clique. The smallest spider is a path 
with four vertices (i.e., a P4) and this spider is at the same time both thick and 
thin. 

The SIMPLE MAX-CUT problem is easy to solve for spiders: 

Remark 4. Let G be a thin spider with 2n vertices where n >3. Then mc{G) = 
\Jt\ + n. If G is a thick spider then mc{G) = n{n — 1). 

A graph G is p-eonnected if for every partition into two non-empty sets there 
is a crossing P 4 , that is a P4 with vertices in both sets of the partition. The 
p-connected components of a graph are the maximal induced subgraphs which 
are p-connected. A p-connected graph is separable if there is a partition (Vi, V 2 ) 
such that every crossing P 4 has its midpoints in Vi and its endpoints in V2. 

Recall that a module is a non-trivial (i.e., not 0 or V) set of vertices which 
have equivalent neighborhoods outside the set. The characteristie of a graph is 
obtained by shrinking the non-trivial modules to single vertices. It can be shown 
(see [2,20]) that a p-connected graph is separable if and only if its characteristic 
is a split graph. 

Our main algorithmic tool is the following structural theorem due to [20] . 
Theorem 4. For an arbitrary graph G exactly one of the following holds: 

— G or G is disconnected. 

— There is a unique proper separable p-eonnected eomponent H of G with sep- 
aration (Vi,V 2) such that every vertex outside H is adjacent to all vertices 

of Vi and to none of V 2 ■ 

— G is p-connected. 

Furthermore, the following characterization of p-connectedness for (q,q — 4)- 
graphs was obtained in [2] (also see [4]). 

Theorem 5. Let G = (V,E) be a {q,q — 4)-graph which is p-connected. Then 
either \V\ < q or G is a spider. 

Theorem 4 and Theorem 5 lead to a binary decomposition tree for (g, q — 4)- 
graphs (also see [3] for more details). This decomposition tree can be found in 
linear time [5]. The leaves of this tree correspond with spiders or graphs with 
less than q vertices (this reflects the last point of Theorem 4 and Theorem 5). 
The internal nodes of this tree have one of three possible labels. If the label of 
an internal node is 0 or 1, then the graph corresponding with this node is the 
disjoint union or the sum of the graphs corresponding with the children of the 
node (this reflects the first point of Theorem 4). If the label of the node is 2 
(this reflects the second point of Theorem 4), one of the graphs, w.l.o.g. G\, 
has a separation and it is either a spider or a graph with less than q 
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vertices of which the characteristic is a split graph (Theorems 4 and 5), and G 2 
is arbitrary. If Gi is a spider, all vertices of G 2 are made adjacent exactly to 
all vertices of the clique (induced by V^) of G\. If G\ is a graph of which the 
characteristic is a split graph, all vertices of G 2 are made adjacent exactly to all 
vertices (i.e., Vi) of every clique module of G\. 

In the following subsections we briefly describe the method to compute the 
simple max-cut for graphs with few The main idea of the algorithm is 

that we compute for each node of the decomposition tree all relevant values of 
mc{G' , i), G' being the graph corresponding with this node. The table of values 
for such a node is computed, given the tables of the children of the node. In the 
subsequent paragraphs, we discuss the methods to do this, for each of the types 
of nodes in the decomposition tree. Once we have the table of the root node, 
i.e., all values mc{G,i), we are done. 

Cographs. We review the algorithm for the simple max-cut problem for 
cographs (i.e., (4, 0)-graphs) which was published in [6]. A cograph which is not 
a single vertex is either the sum or the union of two (smaller) cographs. In other 
words: cographs have a decomposition tree with all internal nodes labelled 0 or 

1 . 

Lemma 2. Let G = (V,E) be the union of G\ = {V\,Ei) and G 2 = {V 2 ,E 2 ). 
Then 

mc(G, i) = max{mc(Gi,_)) -I- mc(G 2 , i — j) : 0<j<iA 

l^ll >JA|F2| >z-j} 

Let G = {V, E) be the sum of Gi = (Vi,Ei) and G 2 = (V 2 ,i? 2 ). Then 
mc{G,i) = max{mc(Gi, j) -h mc{G 2 ,i ~ j) + j{\V 2 \ ~ (* -j)) + 

(l^il j) : 0 < j < i A iFil > j A 1 ^ 2 ! > z - j} 

Corollary 2. There exists an 0{N^) time algorithm to compute the simple max- 
cut of a cograph. 

P 4 -sparse graphs. The decomposition tree (as defined above) for graphs that 
are P 4 -sparse has nodes with label 0, 1, or 2 [17]. Note that in the case of 
label 2 we can assume here that the graph Gi is a spider (see the discussion 
after Theorems 4 and 5, and [18]). In the lemma below, we assume G is obtained 
from Gi and G 2 as described above by the operation of a 2-labeled node. Let K 
be the clique and S be the independent set of G\. Let Ui denote the number of 
vertices of Gj. Note that every vertex of G 2 is adjacent to every vertex of K. 

Lemma 3. Let G, G\, G 2 , S, and K be as above. Let G\ be a thick spider. 
Then 

mc(G,z) = max{mc(G 2 ,j) + j{\K\ - j') + j'{ri 2 - j) + 

{i - 3 - j'){\K\ - 1) : 0<j,/<z} 
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Let Gi be a thin spider. Then 

mc{G,i) = max{mc(G 2 , j) + j{\K\ -/) +/(n 2 - j) + 

(i- J - /) : 0 <j,f < i} 

For P 4 -sparse graphs (i.e., (5, l)-graphs), Lemmas 2 and 3 are sufficient to 
compute all the values mc{G',i) for all graphs associated with nodes in the 
decomposition tree. Thus, we obtain: 

Corollary 3. There exists an 0{N^) time algorithm to compute the simple max- 
cut for a Pi-sparse graph. 



IVil < q and the characteristic of Gi is a split graph. If we have a 
decomposition tree of (g, q — 4)-graphs, then there is one remaining case: G is 
obtained from Gi and G 2 by the operation corresponding to a 2-labeled node, 
and Gi has less than q vertices. In this case the vertex set of G 2 acts as a module, 
i.e., every vertex of G 2 has exactly the same set of neighbors in Gi. Let K be 
the set of vertices of Gi which are adjacent to all vertices of G 2 . 

Let mc{G\,j,j') be the maximum cut in Gi with exactly j vertices in K and 
f vertices in Vi — K. Since Gi is constant size the numbers mc{G\, j, f) can 
easily be computed in constant time. 

Lemma 4. Let G, G\, G 2 , K he as above. Suppose that \Vi \ < q and the char- 
acteristic of Gi is a split graph. Then 

mc{G, i) = max{mc(G 2 , j) + mc{Gi,j', i ~ j ~ j') + 

j(\K\ - f) + j'{ri 2 - 3 ) + {i- j- /) : 0 < j,/ < i} 

Now, with Lemma 4, and Lemmas 2 and 3, we obtain: 

Theorem 6. There exists an O(N^) time algorithm for the simple max-cut 
problem on {q,q — 4)-graphs for each constant q. 

5 Concluding Remarks 

This paper considers two classes of graphs: indifference graphs and (g, q — 4)- 
graphs. Both classes possess nice decomposition properties which we make use 
of in the described algorithms for simple max-cut. Also, both graph classes we 
study are related to split graphs, a class of graphs for which simple max-cut 
is known to be hard. 

A linear-time algorithm for the recognition of indifference graphs was pre- 
sented by de Figueiredo et al. [10]. The algorithm partitions in linear time the 
vertex set of an indifference graph into sets of twin vertices, i.e., vertices of the 
graph that belong to the same set of maximal cliques. 

Given a graph G with a bounded number of maximal cliques, the partition 
of G into sets of twins contains a bounded number k of sets. Hence, we can 
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compute mc{G) in polynomial time, by maximizing a function on k variables 
X, that assume integer values in a limited region of the space, i.e., on a finite 
domain. This simple argument establishes the polynomial upper bound 0{N^) 
for the max;-cut problem for a class of graphs with a bounded number of maximal 
cliques. 

One goal of this paper was to establish a linear time upper bound for the 
computation of mc{G) for a split-indifference graph G, by computing the value 
of mc{G) in constant time, given that we can in linear time determine which 
case of the computation we are in. We leave it as an open problem to extend the 
proposed solution to the whole class of indifference graphs. 

Another goal reached by this paper was to extend to the whole class of 
(g, g — 4)-graphs the known solution of simple max-cut for cographs. We leave 
it as an open problem to find a more efficient polynomial-time algorithm for the 
computation of mc{G) for a {q,q — 4)-graph G. 
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Abstract. We propose a new strategy for solving the non-bijective 
graph matching problem in model-based pattern recognition. The search 
for the best correspondence between a model and an over-segmented 
image is formulated as a combinatorial optimization problem, defined 
by the relational attributed graphs representing the model and the im- 
age where recognition has to be performed, together with the node and 
edge similarities between them. A randomized construction algorithm is 
proposed to build feasible solutions to the problem. Two neighborhood 
structures and a local search procedure for solution improvement are 
also proposed. Computational results are presented and discussed, illus- 
trating the effectiveness of the combined approach involving randomized 
construction and local search. 



1 Introduction 

The recognition and the understanding of complex scenes require not only a 
detailed description of the objects involved, but also of the spatial relationships 
between them. Indeed, the diversity of the forms of the same object in different 
instantiations of a scene, and also the similarities of different objects in the 
same scene, make relationships between objects of prime importance in order to 
disambiguate the recognition of objects with similar appearance. Graph based 
representations are often used for scene representation in image processing [6,9, 
11,20,21]. Vertices of the graphs usually represent the objects in the scenes, while 
their edges represent the relationships between the objects. Relevant information 
for the recognition is extracted from the scene and represented by relational 
attributed graphs. In model-based recognition, both the model and the scene 
are represented by graphs. 

The assumption of a bijection between the elements in two instantiations 
of the same scene is too strong for many problems. Usually, the model has a 
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schematic aspect. Moreover, the construction of the image graph often relies on 
segmentation techniques that may fail in accurately segmenting the image into 
meaningful entities. Therefore, no isomorphism can be expected between both 
graphs and, in consequence, scene recognition may be better expressed as an 
non-bijective graph matching problem. 

Our motivation comes from an application in medical imaging, in which the 
goal consists in recognizing brain structures from 3D magnetic resonance im- 
ages, previously processed by a segmentation method. The model consists of an 
anatomical atlas. A graph is built from the atlas, in which each node represents 
exactly one anatomical structure of interest. Edges of this graph represent spatial 
relationships between the anatomical structures. Inaccuracies constitute one of 
the main characteristics of the problem. Objects in the image are segmented and 
all difficulties with object segmentation will be reflected in the representation, 
such as over-segmentation, unexpected objects found in the scene (pathologies 
for instance), expected objects not found and deformations of objects [13]. Also, 
the attributes computed for the image and the model may be imprecise. To il- 
lustrate these difficulties. Figure 1 presents slices of three different volumes: (a) 
a normal brain, (b) a pathological brain with a tumor, and (c) the representa- 
tion of a brain atlas where each grey level corresponds to a unique connected 
structure. Middle dark structures (lateral ventricles) are much bigger in (b) than 
in (a). The white hyper-signal structure (tumor) does not appear in the atlas 
(c) nor in the normal brain (a). Similar problems occur in other applications, 
such as aerial or satellite image interpretation using a map, face recognition, and 
character recognition. 




(a) (b) (c) 



Fig. 1. Examples of magnetic resonance images: (a) axial slice of a normal brain, (b) 
axial slice of a pathological brain with a tumor, and (c) axial slice of a brain atlas. 



This paper focuses on algorithms for the non-bijective graph matching prob- 
lem [1,7,10,13,15,17,19], which is defined by the relational attributed graphs 
representing the model and the over-segmented image, together with the node 
and edge similarities between their nodes and edges. Section 2 describes our for- 
mulation of the search for the best correspondence between the two graphs as 
a non-bijective graph matching problem. We discuss the nature of the objective 
function and of the constraints of the graph matching problem. A randomized 
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construction algorithm is proposed in Section 3 to build feasible solutions. Be- 
sides the quality of the solutions found, this algorithm may also be used as a 
robust generator of initial solutions for a GRASP metaheuristic [16] or for pop- 
ulation methods such as the genetic algorithm described in [14]. A local search 
algorithm is proposed in Section 4 to improve the solutions obtained by the 
construction algorithm. Numerical results obtained with the randomized con- 
struction and the local search algorithms are presented and discussed in Section 
5. They illustrate the robustness of the construction algorithm and the improve- 
ments attained by the local search algorithm in terms of solution quality and 
object identification. Concluding remarks are presented in the last section. 



2 Non-bijective Graph Matching 

Attributed graphs are widely used in pattern recognition problems. The defini- 
tion of the attributes and the computation of their values are specific to each 
application and problem instance. Fuzzy attributed graphs are used for recog- 
nition under imprecisions [2,3,4,5,12,13,14,15]. The construction of a fuzzy at- 
tributed graph depends on the imperfections of the scene or of the reference 
model, and on the attributes of the object relations. The common point is that 
there is always a single vertex for each region of each image. Differences may 
occur due to the strategy applied for the creation of the edge set, as a result 
of the chosen attributes or of defects in scene segmentation. Once the graph is 
built, the next step consists in computing the attributes of vertices and edges. 
Finally, vertex-similarity and edge-similarity matrices are computed from the 
values of the attributed graphs, relating each pair of vertices and each pair of 
edges, one of them from the model and the other from the segmented image. 

Two graphs are used to represent the problem: G\ = {Ni,Ex) represents the 
model, while G 2 = (A 2 ,i? 2 ) represents the over-segmented image. In each case, 
Ni denotes the vertex set and Ei denotes the edge set, i G {1,2}. We assume that 
1 -^ 1 1 < I -^ 2 !) which is the case when the image is over-segmented with respect to 
the model. 

A solution to the non-bijective graph matching problem is defined by a set 
of associations between the nodes of G\ and G 2 . Each node of G 2 is associated 
with one node of G\. These assignments are represented by binary variables: 
Xij = 1 if nodes t G fVi and j G N 2 are associated, Xij = 0 otherwise. The set 
^(*) = (j G -^2 \xij = 1} denotes the subset of vertices of N 2 associated with 
vertex i G A^i. To ensure that the structure of G\ appears within G 2 , we favor 
solutions where a correspondence between edges also implies a correspondence 
between their extremities (edge association condition). Thus, edge associations 
are derived from vertex associations, according to the following rule: edge (a, b) G 
El is associated with all edges {a',b') G E 2 such that (i) a' G N 2 is associated 
with a £ Ni and b' G N 2 is associated with 6 G or (ii) a' G N 2 is associated 
with b £ Ni and b' £ N 2 is associated with a £ Ni. 

A good matching is a solution in which the associations correspond to high 
similarity values. Similarity matrices are constructed from similarity values cal- 
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culated from graph attributes. The choice of these attributes depends on the im- 
ages. Let S'" (resp. S'®) denote an |fVi| x |Ai 2 | (resp. |ifi| x |if 2 |) vertex-similarity 
(resp. edge-similarity) matrix, where the elements s"{i,j) (resp. s®((z,z'), {j,j'))) 
G [0, 1] represent the similarity between vertices (resp. edges) i G Ni and j G N 2 
(resp. (z,z') G El and (j, j') G £ 2 ). The value of any solution is expressed by an 
objective function, defined for each solution x as 



fix) 






(l-«) 

|l;i|-|l;2| 



rix), 



with 

rix)= E E(i-i^b-^’^(bj)i) 

ieNi jeN2 

and 



rix)= E E - \xaax{xijX,>j>,x^j'Xi>j} - s%(i,i'),(j,j'))\), 



where a is a parameter used to weight each term of /. This function consists of 
two terms which represent the vertex and edge contributions to the measure of 
the solution quality associated with each correspondence. Vertex and edge asso- 
ciations with high similarity values are privileged, while those with low similarity 
values are penalized. The first term represents the average vertex contribution to 
the correspondence. The second term represents the average edge contribution 
to the correspondence and acts to enforce the edge association condition. For 
instance, if s®((z, z'), (j, j')) is high and there are associations between the ex- 
tremities of edges (z,z') and (j, /), then YHdx.{xijXiij',XijiXiij} = 1 and the edge 
contribution is high. On the contrary, if the extremities of edges (z, i') and 
are not associated, then max{xijXi'j>,XijrXi'j} = 0 and the edge contribution 
is null. This function behaves appropriately when the image features are well 
described by the graph attributes. 

The search is restricted only to solutions in which each vertex of N 2 has to be 
associated with exactly one vertex of Ni . The rationale for this condition is that 
image segmentation is performed by an appropriate algorithm which preserves 
the boundaries and, in consequence, avoids situations in which one region of the 
segmented image is located in the frontier of two adjacent regions of the model: 
Constraint (1): For every j G N 2 , there exists exactly one node z G fVi such that 
x^j = 1, i.e. \A~^{j)\ = 1. 

The quality of the input data (vertex and edge similarity matrices) is primor- 
dial for the identification of the best correspondence. However, as this quality is 
not always easy to be achieved in real applications, we emphasize some aspects 
that can be used as additional information to improve the search. Vertices of G 2 
associated with the same vertex of Gi should be connected among themselves in 
real situations, since an over-segmentation method can split an object in several 
smaller pieces, but it does not change the piece positions. Regions of the seg- 
mented image corresponding to the same region of the model should necessarily 
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be connected. A good strategy to avoid this type of solution is to restrain the 
search to correspondences for which each set A{i) of vertices induces a connected 
subgraph in G 2 , for every model vertex i G Ni (connectivity constraint): 
Constraint (2): For every i S A^i, the subgraph induced in G 2 (A^ 2 ,^^ 2 ) by A{i) 
is connected. 

Pairs of vertices with null similarity cannot be associated. Such associations 
are discarded by the constraint below, which strengthens the penalization of 
associations between vertices with low similarity values induced by the objective 
function: 

Constraint (3): For every i G Ni and j G N 2 , if = 0, then Xij = 0. 

Finally, to ensure that all objects of the model appear in the image graph, 
one additional constraint is imposed: 

Constraint (4) : For every i G Ni, there exists at least one node j G N 2 such that 
(i,j) G E' (i.e., |A(i)| > 1)). 

3 Randomized Construction Algorithm 

The construction algorithm proposed in this section is based on progressively 
associating a node of Ni with each node of N 2 , until a feasible solution x is 
built. The objective function f{x) does not have to be evaluated from scratch 
at each iteration. Its value is initialized once for all and progressively updated 
after each new association between a pair of vertices from N\ and N 2 - Since 

n^)=T. = 

ieWi 3&N2 

= E (i-s"(bj))+ E (2s"(bj)-i), 

{i,j)eNixN2 (i,j):xij=l 

then f{x') = f{x) + 2s"(i,j) — 1 for any two solutions x and x' that differ 
only by one additional association between vertices i G Ni and j G A^ 2 - Similar 
considerations are used in the evaluation of the term f^{x), which is increased by 
2s®((a, a'), {b, b')) — 1 whenever a new pair of edges (a, a') G Ei and (6, b') G E 2 
are associated. 

The pseudo-code of the RandomizedConstruction randomized algorithm is 
given in Figure 2. The algorithm takes as parameters the initial seed, the maxi- 
mum number MaxTrials of attempts to build a feasible solution before stopping, 
and the maximum number MaxSolutions of solutions built. We denote by Ec^j) 
the nodes adjacent to vertex j in a graph G. The number of attempts, the num- 
ber of solutions built, and the indicator that a feasible solution has been found 
are initialized in line 1. The optimal value is initialized in line 2. The loop in lines 
3-35 performs at most MaxTrials attempts to build at most MaxSolutions so- 
lutions. Lines 4-7 prepare and initialize the data for each attempt. The solution 
X, the set A{i) of nodes associated with each node i G Ni, and the node A~^{j) 
associated with each node j G N 2 are initialized in line 4. The terms f" and 
are initialized respectively in lines 5 and 6. A temporary copy V 2 of the node set 
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procedure RandomizedConstruction(seeci,Maa;TriaZs,Maa:S'oZMtions) 



1, and feasible <— .FALSE.; 



trials <— 1, solutions ■ 
f* i oo; 

while trials < MaxTrials and solutions < MaxSolutions do 
® 0, A{i) t— 0 Vi G A"!, and A~^{j) t— 0 Vj G A 2 ; 

/ 5Il(i,j)GiVixiV2(^ “ ® (hi)); 

/ ~ ^ ((* 1 0) (iii ))); 

V 2 ^ N 2 ; 
while V 2 0 do 

Randomly select a node j from V 2 and update V 2 t— V 2 — {i}; 

Vi ^ Ni; 

while Vi yf 0 and A~^(j) = 0 do 

Randomly select a node i from Vi and update Vi <— Vi — {i}; 
if s'’{i,j) > 0 and 

the graph induced in G 2 by A{i) U {j} is connected 
then do 

Xij ^ 1 , 

A{i) A{i) U {j} and {i}; 

r + 2s„(i,i)-l; 
forall i' G Igj (i) do 

forall j' G Fg^U) do 
if xuji — 1 

then r -i- r + 2s"’((i,i'), - 1; 

end_forall; 
end_forall; 
end if; 
end_while; 
end while; 

if A{i) / 0 Vi G Ai and A~\j) / 0 Vj G N 2 

then do 

feasible <— .TRUE.; 
solutions t— solutions + 1; 

Compute / ^ a/(|Ai| • IA 2 I) •/" + (!- o)/(|Ai| • IA 2 I) • T; 
if / > /* then update f* <— f and x* <— x\ 

end_if; 

trials -<r- trials + 1; 

35. end_while; 

36. return®*,/*; 

end RandomizedConstruction. 
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Fig. 2. Pseudo-code of the randomized construction algorithm. 



N 2 is created in line 7. The loop in lines 8-26 performs one attempt to create a 
feasible solution and stops after the associations to each node in V 2 have been 
investigated. A node j G V 2 is randomly selected and eliminated from V 2 in line 
9. A temporary copy Vi of the node set Ni is created in line 10. The loop in lines 
11-25 searches for a node in Ni to be associated with node j G V 2 . It stops after 
all possible associations to nodes in Ni have been investigated without success 
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or if one possible association was found. A node i G Vi is randomly selected and 
eliminated from Vi in line 12. The algorithm checks in line 13 if node i can be 
associated with node j, i.e., if their similarity is not null and if the graph induced 
in G 2 by A{i) U {j} is connected. If this is the case, the current solution and 
its objective function value are updated in lines 14-24. The current solution is 
updated in lines 15-16. The term f" corresponding to the node similarities is 
updated in line 17. The term corresponding to the edge similarities is updated 
in lines 18-23. The algorithm checks in line 27 if the solution x built in lines 
8-26 is feasible, i.e., if there is at least one node of N 2 associated with every 
node of A^i and if there is exactly one node of A^i associated with every node of 
A^ 2 - If this is the case, the indicator that a feasible solution was found is reset 
in line 29 and the number of feasible solutions built is updated in line 30. The 
value of the objective function for the new solution is computed in line 31. If the 
new solution is better than the incumbent, then the latter is updated in line 32. 
The number of attempts to build a feasible solution is updated in line 34 and a 
new iteration resumes, until the maximum number of attempts is reached. The 
best solution found x* and its objective function value /* are returned in line 
36. In case no feasible solution was found, the returned value is f* = — 00 . The 
complexity of each attempt to build a feasible solution is 0(|fViP • |A^ 2 p)- 

4 Local Search 

The solutions generated by a randomized construction algorithm are not neces- 
sarily optimal, even with respect to simple neighborhoods. Hence, it is almost 
always beneficial to apply a local search to attempt to improve each constructed 
solution. A local search algorithm works in an iterative fashion by successively 
replacing the current solution by a better solution in the neighborhood of the 
current solution. It terminates when a local optimum is found, i.e., when no 
better solution can be found in the neighborhood of the current solution. 

We define the neighborhood N°'{x) associated with any solution x as formed 
by all feasible solutions that can be obtained by the modification of 
for some j G N 2 - For each vertex j G N 2 , the candidate set C{j) is formed 
by all vertices in A^i that can replace the node currently associated with N 2 , 

i. e. C{j) = {k G Ni I x' is a feasible solution, where x[^ = 1 ii i = k and ^ = 

j, x[( = 0 if i = A~^(j) and £ = j, = xn otherwise}. The number of solutions 
within this neighborhood is bounded by |A^i| • |A^ 2 |- 

The pseudo-code of the local search algorithm LS using a first-improving 
strategy based on the neighborhood structure iV“ defined above is given in Fig- 
ure 3. The algorithm takes as inputs the solution x* built by the randomized 
construction algorithm and its objective function value /*. Initializations are 
performed in lines 1-2. The loop in lines 3-32 performs the local search and 
stops at the first local optimum of the objective function with respect to the 
neighborhood defined by the sets C{j). The control variable is initialized at each 
local search iteration in line 4. The loop in lines 5-31 considers each node j of 
graph G 2 . The replacement of the node i = A~^{j) currently associated with j 
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procedure LS(x* , f*) 



1 . 

2 . 

3. 

4. 

5. 

6 . 

7. 

8 . 

9. 

10 . 

11 . 

12 . 

13. 

14. 

15. 

16. 

17. 

18. 

19. 

20 . 

21 . 

22 . 

23. 

24. 

25. 

26. 

27. 

28. 

29. 

30. 

31. 

32. 

33. 
end LS. 



improvement <— .TRUE.; 

Build sets € N 2 ; 

while improvement do 

improvement t— .FALSE.; 

forall j G N 2 while ,WI .improvement do 

forall k G C{j) while .'HOT: .improvement do 
A- ^2-s^{k,j)-2-s^{i,j)- 
0; 

forall j' G 7 G 2 O) do 

forall i' G Igi (i) do 
iff' = A-i(j') 

then A^^A^ + 1-2- s^{{i,i'), (j,/))); 

end_forall; 
forall k' G Foi (fe) do 
if k' = 

then A^^A^-1 + 2- s^{{k, k'), 

end_forall; 
end forall; 

A ^ a/(|iVi| • |iV2|) • A“ + (1 - a)/{\E^\ ■ \E2\) ■ 
if A > 0 

then do 

improvement <— .TRUE.; 

xlj 1 , x*j 0 ; 

A{i) ^ A{i) - {j}; 

A{k) <r- A{k) U {j}; 
r^r + A; 

Update sets C{j),'ij G N 2 \ 
end_if; 
end_forall; 
end forall; 
end_while; 
return 



Fig. 3. Pseudo-code of the basic local search algorithm using neighborhood N°'. 



(line 6) by each node belonging to the candidate set C{j) is investigated in the 
loop in lines 7-30. The increase in the value of the objective function due to the 
node similarity contributions is computed in line 8, while that due to the edge 
similarity contributions is computed in lines 9-19. If the total increase in the 
objective function value computed in line 20 is strictly positive (line 21), then 
the current solution and the control variables are updated in lines 22-28. The 
procedure returns in line 33 the local optimum found and the corresponding so- 
lution value. Each local search iteration within neighborhood fV“ has complexity 

0{\N,\-\N2\^ + \N2\-\E2\). 
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We notice that if A{i) = {j} for some i G Ni and j G N 2 (i.e. |A(t)| = 1) 
then \C{j) \ = 0, because in this case vertex i would not be associated with any 
other vertex. It can also be the case that a node j G A{i) cannot be associated 
with any other node because A(i) \ {j} induces a non-connected graph in G 2 . 
In consequence, in these situation the vertex associated with node j cannot be 
changed by local search within the neighborhood fV“, even if there are other 
feasible associations. As an attempt to avoid this situation, we define a second 
neighborhood structure N^{x) associated with any feasible solution x. This is 
a swap neighborhood, in which the associations of two vertices j,f G N 2 are 
exchanged. A solution x' G N^{x) if there are two vertices i' ,i” G -^1 and two 
vertices f G N 2 such that Xi'j' = 1, Xi"j" = 1, = 1, and x[,,j, = 1, with 

all other associations in solutions x and x' being the same. 

Local search within the swap neighborhood has a higher time complexity 
0(|iV2p • IA 2 I) than within neighborhood fV“. Also, |A^*'(x)| >> |A^“(a;)| for any 
feasible solution x. Accordingly, we propose an extended local search procedure 
LS+ which makes use of both neighborhoods. Whenever the basic local search 
procedure LS identifies a local optimum x* with respect to neighborhood iV“, 
the extended procedure starts a local search from the current solution x* within 
neighborhood . If this solution is also optimal with respect to neighborhood 

, then the extended procedure stops; otherwise algorithm LS resumes from 
any improving neighbor of x* within . 

5 Computational Results 

The algorithms described in the previous sections were implemented in C and 
compiled with version 2.96 of the gcc compiler. We used an implementation in 
C of the random number generator described in [18]. All computational exper- 
iments were performed on a 450 MHz Pentium II computer with 128 Mbytes 
of RAM memory, running under version 7.1 of the Red Hat Linux operating 
system. 

Unlike other problems in the literature, there are no benchmark instances 
available for the problem studied in this paper. We describe below a subset of 
seven test instances used in the evaluation of the model and the algorithms 
proposed in Sections 3 and 4. 

Instances GM-5, GM-8, and GM-9 were randomly generated [1], with node 
and edge similarity values in the interval [0,1]. Instance GM-8 was also used 
in the computational experiments reported in [1]. Instances GM-5 and GM-8 
have isolated nodes: two in the image graph G 2 of GM-5 and three in the model 
graph Gi of GM-8. Instances GM-5a and GM-8a are derived from them, by the 
introduction of additional edges to connect the isolated nodes. 

Instances GM-6 and GM-7 were provided by Perchant and Bengoetxea [12, 
14] and built from real images. Instance GM-6 was built from magnetic resonance 
images of a human brain, as depicted in Figure 4. Instance GM-7 was created 
for the computational experiments reported in [14] from the 2D images given in 
Figure 5. The image (a) was over-segmented in 28 regions (c) and compared 
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with a model with only 14 well defined regions (b). The model graph Gi has 14 
vertices and 27 edges, while the over-segmented image graph G 2 has 28 vertices 
and 63 edges. Grey levels were used in the computation of node similarities, while 
distances and adjacencies were used for the computation of edge similarities. 




(a) (b) (c) 

Fig. 4. Instance GM-6: (a) original image, (b) model, and (c) over-segmented image. 




(a) (b) (c) 



Fig. 5. Gut of a muscle (instance GM-7): (a) original 2D image, (b) model, and (c) 
over-segmented image. 



We summarize the characteristics of instances GM-5 to GM-9 in Table 1. 
For each instance, we first give the number of nodes and edges of the model 
and image graphs. We also give the optimal value /* obtained by the exact 
integer programming formulation proposed by Duarte [8] using the mixed integer 
programming solver GPLEX 9.0 and the associated computation time in seconds 
on a 2.0 GHz Pentium IV computer (whenever available), considering exclusively 
the vertex contribution to the objective function. In the last two columns, we 
give the value of the solution obtained by the randomized construction 

algorithm followed by the application of the extended local search procedure 
LS+ and the total computation time in seconds, with the maximum number of 





110 M.C. Boeres, C.C. Ribeiro, and I. Bloch 



Table 1. Characteristics and exact results for instances GM-5 to GM-9. 



Instance 


lA'il 


l^^il 


IIV2I 


IT2I 


r 


time (s) 


— 


time (s) 


GM-5 


10 


15 


30 


39 


0.5676 


7113.34 


0.5534 


0.01 


GM-5a 


10 


15 


30 


41 


0.5690 


2559.45 


0.5460 


0.02 


GM-6 


12 


42 


95 


1434 


0.4294 


23668.17 


0.4286 


2.68 


GM-7 


14 


27 


28 


63 


0.6999 


113.84 


0.6949 


< 10"® 


GM-8 


30 


39 


100 


297 


(a) 0.5331 


(a) 4.27 


0.5209 


1.02 


GM-8a 


30 


42 


100 


297 


(a) 0.5331 


(a) 4.12 


0.5209 


1.02 


GM-9 


50 


88 


250 


1681 


- 


- 


0.5204 


42.26 



(a) linear programming relaxation 
not available 



attemps to find a feasible solution set at MaxTrials = 500 and the maximum 
number of feasible solutions built set at MaxSolutions = 100. 

The results in Table 1 illustrate the effectiveness of the heuristics proposed 
in this work. The non-bijective graph matching problem can be exactly solved 
only for small problems by a state-of-the-art solver such as CPLEX 9.0. Even 
the medium size instances GM-8 and GM-8a cannot be exactly solved. Only the 
linear programming bounds can be computed in resonable computation times for 
both of them. On the other hand, the combination of the randomized construc- 
tion algorithm with the local search procedure provides high quality solutions in 
very small computation times. Good approximate solutions for the medium size 
instances GM-8 and GM-8a (which were not exactly solved by GPLEX) within 
2.3% of optimality can be easily computed in processing times as low as one 
second. 

Table 2 illustrates the results obtained by the randomized construction al- 
gorithm and the extended local search procedure for instances GM-5 to GM-9 
with a = 0.9. The maximum number of attemps to find a feasible solution was 
fixed at MaxTrials = 500 and the maximum number of feasible solutions built 
was fixed at MaxSolutions = 100. For each instance, we give the number of 
attempts necessary to find the first feasible solution, the value of the first 
feasible solution found, the number of attempts necessary to find the best among 
the first 100 feasible solutions built, the value of the best feasible solution 

found, and the average computation time per attempt in seconds. The last three 
columns report statistics for the extended local search algorithm: the number of 
local search iterations until local optimality, the value of the best solution 

found, and the average computation time per iteration in seconds. 

The computation time taken by each attempt of the randomized construc- 
tion algorithm to build a feasible solution is very small, even for the largest 
instances. The algorithm is very fast and finds the first feasible solution in only 
a few attempts, except in the cases of the difficult instances with isolated nodes. 
However, even in the case of the hard instance GM-5, the algorithm managed 
to find a feasible solution after 297 attempts. For the other instances, the con- 
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struction algorithm found a feasible solution in very few iterations. Even better 
solutions can be obtained if additional attempts are performed. 

The local search algorithm improved the solutions built by the construction 
algorithm for all test instances. The average improvement with respect to the 
value of the solution obtained by the construction algorithm was approximately 
3%. 



Table 2. Results obtained by the randomized construction algorithm and the extended 
local search procedure with MaxTrials = 500 and MaxSolutions = 100. 



Instance 


first 


— pr- 


best 


j(lOO) 


time (s) 


iteration 


fLS+ 


time (s) 


GM-5 


297 


0.5168 


297 


0.5168 


< 10“^ 


19 


0.5474 


0.002 


GM-5a 


9 


0.4981 


417 


0.5243 


< 10"^ 


13 


0.5434 


0.002 


GM-6 


1 


0.4122 


40 


0.4168 


0.001 


320 


0.4248 


0.020 


GM-7 


5 


0.6182 


34 


0.6282 


< 10"^ 


12 


0.6319 


0.001 


GM-8 


26 


0.4978 


292 


0.5022 


0.002 


118 


0.5186 


0.014 


GM-8a 


26 


0.5014 


292 


0.5058 


0.002 


120 


0.5222 


0.014 


GM-9 


1 


0.5049 


207 


0.5060 


0.010 


511 


0.5187 


0.134 



6 Concluding Remarks 

We formulated the problem of finding the best correspondence between two 
graphs representing a model and an over-segmented image as a combinatorial 
optimization problem. 

A robust randomized construction algorithm was proposed to build feasible 
solutions for the graph matching problem. We also proposed a local search algo- 
rithm based on two neighborhood structures to improve the solutions built by 
the construction algorithm. Computational results were presented to different 
test problems. Both algorithms are fast and easily found feasible solutions to 
realistic problems with up to 250 nodes and 1681 edges in the graph represent- 
ing the over-segmented image. The local search algorithm consistently improved 
the solutions found by the construction heuristic. Both algorithms can be easily 
adapted to handle more complex objective function formulations. 

Besides the quality of the solutions found, the randomized algorithm may also 
be used as a robust generator of initial solutions for population methods such as 
the genetic algorithm described in [14], replacing the low quality randomly gener- 
ated solutions originally proposed. The construction and local search algorithms 
can also be put together into an implementation of the GRASP metaheuristic 
[16]. 
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Abstract. Let C be an n-dimensional integral box, and tt be a mono- 
tone property defined over the elements of C. We consider the problems 
of incrementally generating jointly the families T-n and of all mini- 
mal subsets satisfying property tt and all maximal subsets not satisfying 
property tt, when tt is given by a polynomial-time satisfiability oracle. 
Problems of this type arise in many practical applications. It is known 
that the above joint generation problem can be solved in incremental 
quasi-polynomial time. In this paper, we present an efficient implemen- 
tation of this procedure. We present experimental results to evaluate our 
implementation for a number of interesting monotone properties tt. 



1 Introduction 

Let C = Cl X • • • xC„ be an integral box, where Ci , . . . , C„ are finite sets of integers. 
For a subset ACC, let us denote by = {x & C \ x > a, for some a £ A} and 
A~ = {x £ C \ X < a, for some a £ A}, the ideal and filter generated by A. Any 
element in C \ A~^ is called independent of A, and we let I{A) denote the set of 
all maximal independent elements for A. Call a family of vectors A Sperner if A 
is an antichain, i.e. if no two elements are comparable in A. If C is the Boolean 
cube 2[”1, we get the well-known definitions of a hypergraph A and its family of 
maximal independent sets I{A). 

Let 7T : C I— (0, 1} be a monotone property defined over the elements of C: 
a X £ C satisfies property tt, i.e. tt{x) = 1, then any y £ C such that y > x also 
satisfies tt. We assume that tt is described by a polynomial satisfiability oracle 
i.e. an algorithm that can decide whether a given vector x £ C satisfies tt, 
in time polynomial in n and the size \tt\ of the input description of tt. Denote 

* This research was supported by the National Science Foundation (Grant IIS- 
0118635). The third author is also grateful for the partial support by DIMACS, 
the National Science Foundation’s Center for Discrete Mathematics and Theoretical 
Computer Science. 
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respectively by and the families of minimal elements satisfying property tt, 
and maximal elements not satisfying property tt. Then it is clear that 
for any monotone property tt. Given a monotone property tt, we consider the 
problem of jointly generating the families and Q-^-. 

GEN{C, X , y )'. Given a monotone property tt, represented by a satisfi- 

ability oracle and two explicitly listed vector families X C Tt^ C C and 
y QQ-k QC, either find a new element in \ T) U \3^); or prove that 
these families are complete: (X,y) = (lF 7 r,t/,r). 

It is clear that for a given monotone property tt, described by a satisfiability 
oracle O^, we can generate both IF^ and Gtv simultaneously by starting with 
X = y = % and solving problem GEN(C, t/,r, X ,y) for a total of + + 1 

times, incrementing in each iteration either T or by the newly found vector 
X G (^7T \ df) U \ 3^), according to the answer of the oracle until we have 

(X,y) = 

In most practical applications, the requirement is to generate either the fam- 
ily IFtt or the family G^, i.e. we consider the separate generation problems: 

GEN{C,Tt^,X) {GEN{C,GTr,y))'- Given a monotone property TT and a subfam- 
ily X C Ttt Q C (respectively, y Q G-k Q C), either find a new minimal 
satisfying vector x G Tt^ \ X (respectively, maximal non-satisfying vector 
X € GTr\y), or prove that the given partial list is complete: X = Tt^ (respec- 
tively, y = g-k)- 

Problems GEN(C, Tt^,X) and GEN(C, 3^) arise in many practical applica- 
tions and in a variety of fields, including artificial intelligence [14], game theory 
[18,19], reliability theory [8,12], database theory [9,14,17], integer programming 
[4,6,20], learning theory [1], and data mining [2,6,9]. Even though these two 
problems may be NP-hard in general (see e.g. [20]), it is known that the joint 
generation problem GEN(C, .F,r, df, 3^) can be solved in incremental quasi- 
polynomial time poly {n, log \\C\\ao) for any monotone property tt 

described by a satisfiability oracle, where ||C|]oo = X^r=i 1^*1 ^ = |df| + |3^|, 

see [4,15]. In particular, there is a polynomial-time reduction from the joint gen- 
eration problem to the following problem, known as dualization on boxes, see [4, 
10,16]: 

DUAL(C, Fl, yB): Given a family of vectors A Q C , and a subset B QI{A) of its 
maximal independent vectors, either find a new maximal independent vector 
X G F(Fl) \ B, or prove that no such vector exists, i.e., B = I{A). 

The currently best known algorithm for dualization runs in time poly{n) -\- 
TO°(^°g’”), see [4,15]. Unfortunately, this joint generation may not be an effi- 
cient algorithm for solving either of GEN(C, Tt^,X') or GEN(C, Gi^, 3^) separately 
for the simple reason that we do not control which of the families \ X and 
Gix\y contains each new vector produced by the algorithm. Suppose we want 
to generate and the family is exponentially larger than Then, if we 
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are unlucky, we may get elements of with exponential delay, while getting 
large subfamilies of (which are not needed at all) in between. However, there 
are two reasons that we are interested in solving the joint generation problem 
efficiently. First, it is easy to see [17] that no satisfiability oracle based algorithm 
can generate Tt^ in fewer than \Tt^\ + \Gn\ steps, in general: 

Proposition 1. Consider an arbitrary algorithm A, which generates the family 
Ttt by using only a satisfiability oracle Ot^ for the monotone property tt. Then, 
for the algorithm to generate the whole family it must call the oracle at least 
+ \Gtt\ times. 

Proof. Clearly, any monotone property tt can be defined by its value on the 
boundary U For any y G U let us thus define the boundary of the 
monotone property tt^ as follows: 



, . f 7r(x) if a; yf u 

^y(^) = j ^ -r 
" ( 7r(a:) ii x = y. 

Then -Ky is a monotone property different from tt, and algorithm A must be able 
to distinguish the Sperner families described by tt and tt^ for every y G U Qt^. 

□ 



Second, for a wide class of Sperner families (or equivalently, monotone prop- 
erties), the so-called uniformly dual-bounded families, it was realized that the 
size of the dual family ^(.Fjr) is uniformly bounded by a (quasi-) polynomial in 
the size of and the oracle description: 

\I{X) nl(J^^)| < {quasi-)poly{\X\, [tt]) (1) 

for any non-empty subfamily X C Tt^. An inequality of the form (1) would imply 
that joint generation is an incrementally efficient way for generating the family 
Tt^ (see Section 2 below and also [5] for several interesting examples). 



In [7], we presented an efficient implementation of a quasi-polynomial dual- 
ization algorithm on boxes. Direct application of this implementation to solve 
the joint generation problem may not be very efficient since each call to the dual- 
ization code requires the construction of a new recursion tree, wasting therefore 
information that might have been collected from previous calls. A much more 
efficient approach, which we implement in this paper, is to use the same recur- 
sion tree for generating all elements of the families and The details of 
this method will be described in Sections 3 and 4. In Section 2, we give three 
examples of monotone properties, that will be used in our experimental study. 
Finally, Section 5 presents our experimental findings, and Section 6 provides 
some conclusions. 
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2 Examples of Monotone Properties 

We consider the following three monotone properties in this paper: 

Monotone systems of linear inequalities. Let A G be a given non-negative 

real matrix, 6 G M’’ be a given r-vector, c G K” be a given non-negative n-vector, 
and consider the system of linear inequalities: 

Ax > 6, x G C = {a; G Z" I 0 < a; < c}. (2) 

For X G C, let 7 ri(a;) be the property that x satisfies (2). Then the families 
and Qtti correspond respectively to the minimal feasible and maximal infeasible 
vectors for (2). It is known [4] that the family IFtti is (uniformly) dual bounded: 

(3) 



Minimal infrequent and maximal frequent sets in binary databases. Let T> : 
RxV I— >■ {0,1} be a given rxn binary matrix representing a set R of transactions 
over a set of attributes V. To each subset of columns X C V, let us associate 
the subset 5'(X) = Sxi{X) C R of all those rows t G i? for which V{i,j) = 1 in 
every column j G X. The cardinality of S{X) is called the support of X. Given 
an integer t, a column set X C 17 is called t- frequent if |S'(X) | > t and otherwise, 

is said to be t-infrequent. For each set X G C 2^, let 7T2(X) be the property 
that X is t-infrequent. Then 7 T 2 is a monotone property and the families and 
correspond respectively to minimal infrequent and maximal frequent sets for 
T>. It is known [9] that 



|Z(^.J|<(r-t+l)|X^,|. (4) 

Problems GEN(C, X.,r 2 ) 3^) and GEN(C, t/jraj 3^) appear in data mining applica- 
tions, see e.g. [17]. 

Sparse boxes for multi- dimensional data. Let 5 be a set of points in M", and 
t < |5| be a given integer. A maximal t-box is a closed n-dimensional interval 
which contains at most t points of S in its interior, and which is maximal with 
respect to this property (i.e., cannot be extended in any direction without strictly 
enclosing more points of 5). Define Ci = {pi |pGiS}fort = l,...,n and 
consider the family of boxes B = {[a, b] CM" | a,b £ Ci x ■ ■ ■ x Cn, a < b}. 

def 

For i = 1, ... ,n, let Ui = maxCi, and let Ci+„ = {ui —p\p£ Cj} be the 
chain ordered in the direction opposite to Ci. Gonsider the 2n-dimensional box 
C = Cl X • • -xCnXCn+i X • • • xC 2 n and let us represent every n-dimensional interval 
[a, b] £ B as the 2n-dimensional vector (a, u — b) £ C, where u = (ui , . . . , n„). 
This gives a monotone injective mapping B C (not all elements of C define a 
box, since at > bi is possible for (a, u — b) £ C). Let us now define the monotone 
property 713 to be satisfied by an x G C if and only if x does not define a box, or 
the box defined by x contains at most t points of S in its interior. Then the sets 
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^^3 and can be identified respectively with the set of maximal t-boxes (plus 
a polynomial number of non-boxes) , and the set of minimal boxes of x G yB C C 
which contain at least fc -I- 1 points of S in their interior. It is known [6] that the 
family of maximal t-boxes is (uniformly) dual-bounded: 

|I(.F.3)|<|5||.F.3|. (5) 

The problem of generating all elements of .7^3 has been studied in the ma- 
chine learning and computational geometry literatures (see [11,13,23]), and is 
motivated by the discovery of missing associations or “holes” in data mining ap- 
plications (see [3,21,22]). [13] gives an algorithm, for solving this problem, whose 
worst-case time complexity is exponential in the dimension n of the given point 
set. 

3 Terminology and Outline of the Algorithm 

Throughout the paper, we assume that we are given an integer box C* = 
Cl X ... X C*, where C* = [I* : u*], and I* < u*, are integers, and a mono- 
tone property tt, described by a polynomial time satisfiability oracle, for which 
it is required to generate the families and The generation algorithm, 
considered in this paper, is based on a reduction to a dualization algorithm of 
[4], for which an efficient implementation was given in [7]. For completeness, we 
briefly outline this algorithm. The problem is solved by decomposing it into a 
number of smaller subproblems and solving each of them recursively. The input 
to each such subproblem is a sub-box C of the original box C* and two subsets 
T C T* and ^ C of integral vectors, where T* C and Q* C Q.^ denote 
respectively the subfamilies of minimal satisfying and maximal non-satisfying 
vectors that have been generated so far. Note that, by definition, the following 
condition holds for the original problem and all subsequent subproblems: 

a ^ b, for all a £ J^,b £ Q. (6) 

def 

Given an element a £ T {b £ G), we say that a coordinate i £[n] = {1, . . . , n} 
is essential for a (respectively, b), in the box C = [h : mi] x • • • x [?„ : u„], if 
Ui > li (respectively, if bi < Ui). Let us denote by Ess(x) the set of essential 
coordinates of an element x £ IF U G- Finally, given a sub-box C C C*, and two 
subsets T <£T* and G C G* ■, we shall say that T is dual to t/ in C if 1F^UG~ 3 C. 

A key lemma, on which the algorithm in [4] is based, is that either (i) there 
is an element x £ IF U G with at most 1 /e essential coordinates, where e 
1/(1 -|- logm) and m |.7^| -I- \G\, or (ii) one can easily find a new element 
z £ C \ U G~), by picking each element Zi independently at random from 
{li, Ui\ for i = 1, . . . ,n] see subroutine Random solution() in the next section. In 
case (i), one can decompose the problem into two strictly smaller subproblems as 
follows. Assume, without loss of generality, that x £ T has at most 1/e essential 
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coordinates. Then, by (6), there is an t G [n] such that \{b £ G : bi < Xi}\ > 
e\G\- This allows us to decompose the original problem into two subproblems 
GEN(C', 7T, .7^, t/') and GEN(C", tt, . 7^", t/), where C = Ci x • • • x x [xi : 
Ui\ X X • * * X Cnj G — G , C = Cl X * * * X Ci—1 X \li Xi 1] X Cj-j-i X * * • X Cnj 
and T" = T^C~ . This way, the algorithm is guaranteed to reduce the cardinality 
of one of the sets T or G hy & factor of at least 1 — e at each recursive step. 
For efficiency reasons, we do two modifications to this basic approach. First, 
we use sampling to estimate the sizes of the sets G' (see subroutine Est() 
below). Second, once we have determined the new sub-boxes C\C" above, we 
do not compute the active families G' and T" at each recursion step (this is 
called the Gleanup step in the next section). Instead, we perform the cleanup 
step only when the number of vectors reduces by a certain factor /, say 1/2, 
for two reasons: First, this improves the running time since the elimination of 
vectors is done less frequently. Second, the expected total memory required by 
all the nodes of the path from the root of the recursion tree to a leaf is at most 
0{nm + m /{I — f)), which is linear in m for constant / < 1. 



4 The Algorithm 

We use the following data structures in our implementation: 

— Two arrays of vectors, F and G containing the elements of F* and G* re- 
spectively. 

— Two (dynamic) arrays of indices, index(.7^) and index(C/), containing the 
indices of vectors from F* and G* (i.e. containing pointers to elements of 
the arrays F and G), that appear in the current subproblem. These arrays 
are used to enable sampling from the sets F and G, and also to keep track 
of which vectors are currently active, i.e, intersect the current box. 

— Two balanced binary search trees T(iF*) and T(C/*), built on the elements of 
F* and G* respectively using lexicographic ordering. Each node of the tree 
T(iF*) (T(C/*)) contains an index of an element in the array F (G). This 
way, checking whether a given vector x G C belongs to F* (G*) or not, takes 
only 0{n\og\F*\) (0(n log |t/*|)) time. 



In the sequel, we let m = \F\ + \G\ and e = 1/(1 -I- logm). We use the 
following subroutines in our implementation: 

Minimization min;r(^). It takes as input a vector z G F^ and returns a min- 
imal vector z* in F~^ fl {z}~ . Such a vector z* = min^(z) can, for instance, be 
computed by coordinate descent: 

zl G- min{j/i I {yi,y2,... ,j/„_i,?/„) G F+ r[{z}~}, 

Z2 ^ min{j/2 I (zi, j/2, • • ■ ,j/„-i,j/„) G F+ D {z}~}, 

z* ^ min{i/„ I (zi,Z2,... ,<_i,y„) G.7^+n {z}"}. 
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Note that each of the n coordinate steps in the above procedure can be reduced 
via binary search to at most log(||C|loo + 1) satisfiability oracle calls for the 
monotone property tt. 

More efficient procedures can be obtained if we specialize this routine to the 
specific monotone property under consideration. For instance, for property tti 
this operation can be performed in 0(nr) steps as follows. For j = 1,... ,r, 
let GjX > bj be the jth inequality of the system. We initialize z* •<— z and 
Wj = ajz* — bj for j = 1,... ,r. For the tth step of the coordinate descend 
operation, we let z* ^ z* — [AJ and Wj Wj — [AJoji for j = 1, . . . , r, where 

f ^7 

A = min < : a-a > 0 

i<i<7’ 1^ ttji 



Now consider property tt 2 - Given a set Z G the operation minjTr^^(Z) 
can be done in 0{nr) by initializing Z* ^ Z, s ^ \S{Z)\ and c{Y) ^ \Z\Y\ 
for all Y G T>, and repeating, for i G Z, the following two steps: (i) y G- {Y G 
V : c{Y) = 1, Y^i}-, (ii) if |3^| + s < t-1 then 1. Z* ^ Z*\{i}, 2. s ^ s+|3^|, 
and 3. c{Y) G- c{Y) — 1 for each Y gV such that Y ^ i. 

For the monotone property tts and z G C, the operation Tnvajp^^{z) can be 
done in 0(n|5|) as follows. For each point p G S, let p' G be the point with 
components p' = for t = 1, . . . , n, and p' = — Pi_„ for i = n+ 1, . . . , 2n. 

Initialize s(z) ^ |{p G S : p is in the interior of z} and c(p) ^ \{i G [n] : p' < 
Zi}\ for all p G S. Repeat, for f = 1, . . . , 2n, the following steps: (i) z* ^ min{p' : 
p G 5, c(p) = 1, p' < z* and \{q G S : c{q) = 1, p' < q^ < Zi}\ < t - s(z)}; 
(ii) c(p) ^ c(p) — 1 for each p G S such that z* < p[ < Zi- Note that (i) can 
be performed in 0(|5|) steps assuming that we know the sorted order for the 
points along each coordinate. 

Maximization maxg(z). It computes, for a given vector z G G~ , a maximal 
vector z* G G~ Hz’'". Similar to minj 7 r(z), this problem can be done, in general, by 
coordinate descent. For G-ki, G-k 2 and Gtt^, this operation can be done in 0{nr), 
0{nr), and 0(n|5|) respectively. 

Below, we denote respectively by Tmm and T^ax the maximum time taken 
by the routines Tiimjp^{z) and maxg^(z) on any point z G C. 

Exhaustive duality(C, tt, iF, C/). Assuming |lF||t/| < 1, check if there are no 

other vectors in C \ U G~) as follows. First, if \T\ = |t/| = 1 then find an 

i G [n] such that Oi > bi, where T = {a} and G = {^}: (Such a coordinate is 
guaranteed to exist by (6)) 

1. If there is a j ^ i such that bj < Uj then let z = 

(wi , . . . , Ui—i , , Zij-i-i , . . . , Wn) ■ 

2. Otherwise, if there is a j i such that aj > Ij then let z = {u \, . . . , Uj-i,Qj — 

1 , Uj -|_ 1 , . . . ^Uri). 

3. If < Oi — 1 then let z = (mi, . . . , Ui-\, Ui — 1, Mi+i, . . . , m„). 
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In cases 1, 2 and 3, return either min;F„(z) or maxQ^(z) depending on whether 
tt(z) = 1 or tt(z) = 0, respectively. 

4. Otherwise return FALSE (meaning that T and Q are dual in C). 

Second, if \T\ = 0 then check satisfiability of u. If tt{u) = I then return 
min:P^(u). Else, if tt{u) = 0 then let 2 = maxc;^('u), and return either FALSE or 
0 depending on whether z & Q* or not (this check can be done in 0(n log |t/*|) 
using the search tree T(C/*)). Finally, if \G\ = 0 then check satisfiability of 1. 
If tt{1) = 0 then return maxg^{l). Else, if tt{1) = 1 then let z = minjr,^(^), and 
return either FALSE or z depending on whether z G T* or not. This step takes 
0{Taax{n\og\T*\ +Tmin,n\og\Q*\ + Tmax}) time. 



Random solution(C, tt, iF*, C/*). Repeat the following for A: = 1, . . . , times, 
where ti is a constant (say 10): Find a random point z^ G C, by picking 
each coordinate zf randomly from {li,ut}, i = 1,... ,n. If tt(z^) = 1 then 
let (z^)* ^ min;r,,(z^), and if (z^)* ^ E* then return (z^)* G \ E* . 
If 7 t(z^) = 0 then let (z^)* ^ maxg^(z^), and if (z^)* ^ Q* then return 
& Q^\ G*. If {(2^)*,-.- ,(-2*0*}"c E* AG* then return FALSE. This 
step takes 0{u\ax{n\og\E*\ + Tmin,n\og\G*\ + Tmax}) time, and is used to 
check whether U G~ covers a large portion of C. 



Count estimation. For a subset X <Z E (or X C G), use sampling to estimate 
the number Est(A’,C) of elements of T C .F (or X C G) that are active with 
respect to the current box C, i.e. the elements of the set A" {a G T | a+ flC Y 

0} (A" {6 G A" I 6“ n C yf 0}). This can be done as follows. For ti = 

0{\og{\E\ + \G\)/e), pick elements , . . . , G F at random, and let the random 
variable F = ^ * |{x* G A" : i = 1, . . . .t 2 }\. Repeat this step independently 
for a total of = 0(log(|F| + |t/|)) times to obtain estimates F^, . . . ,F*3, 
and let Est(A’,C) =min{F^,... This step requires O(nlog^m) time. 



Cleanup(F,C) (Cleanup(t/,C)). Set F' ^ {a G F | a+ fl C yf 0} (respectively, 
G' {b G G \ b~ nC Y 0})) and return E' (respectively, G')- This step takes 
0(n|F|) (respectively, 0{n\G\)). 



Now, we describe the implementation of procedure GEN(C, tt, F, f/) which is 
called initially using C C* , E % and ^ 0. At the return of this call, the 
families F* and G* , which are initially empty, are extended respectively by the 
elements in F,r and G-k- Below we assume that / G (0, 1) is a constant, say 1/2. 
The families F° and G° represent respectively the subfamilies of F^ and G-k that 
are generated at each recursion tree node. 
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Procedure GEN(C, 5): 

Input: A box C = C\ x • • • x Cn, a monotone property tt, and subsets T C T-n, and 

Q 

Output: Subsets T° C Tt^ \ ^ and Q° G Q^\ Q. 

1 . ^0,g° ^ 0 . 

2. While \T\\g\ < 1 

3. z <r- Exhaustive duality(C,7r,^, C/). 

4. liz= FALSE then return(jP'°, g°). 

5. else if -k{z) = 1 then T ^ T A {z}, T° ^ T° VJ {z}, E* <— E* U {z}. 

6. else g ^gu {z}, g° ^ g° u {z}, g* ^g*u {z}. 

7. end while 

8. at— Random Solution(C, tt, t/*). 

9. While (a FALSE) do 

10. if 7r(a) = 1 then E ■(— E U {a}, .7^° t— .7^° U {a}, .7^* t— .7^* U {a}. 

11. else g ^gu {a}, g° ^ g° U {a}, g* ^g*U {a}. 

12. a t— Random Solution(C, tt, .7^*, C/*). 

13. end while 

14. X* t— argmin{| Ess(t/)| : y G (E nC~) U (g nC^)}. 

15. If X* € .7^ then 

16. i t— argmax{Est({6 £ : bj < Xj},C) : j £ Ess(x*)}. 

17. C' = Cl X • • • X Ci-i X [x* : Mi] X Ci+i X • • • X C„. 

18. If Est(g,C') < /* lei then 

19. g' t— Cleanup(5,C'). 

20. else 

21 . g' G- g. 

22. {Ei,gi)^GW{C',-K,E,g'). 

23. E° t— E° VJ El, E E VJ El, E* t— E* U Ei. 

24. g° ^g°\jgi,g ^g\jgi,g* ^g*\jgi. 

25. C" = Cl X • • • X Ci_i X [U : x| — 1] X Ci+i X ■■■ X Cn- 

26. If Est(J^, C") <f*\E\ then 

27. ^ Cleanup (j^,C"). 

28. else 

29. E" G- E. 

30. <yEr,gr) G- GEN(C" , TT, , 5) . 

31. E° G- E°yjEr, E* G- E*yjEr, g° g- g* G- g*ugr- 

32. else 

33-48. Symmetric versions for Steps 16-31 above (details omitted). 

49. end if 

50. Return (E° ,g°). 



The following result, regarding the expected running time of the algorithm, 
is inherent from [7]. 



Proposition 2. The expected number of recursive calls until a new element 
in {Ett \ E*) U (t/ 7 r \ G*) is output, or procedure GEN{C,7T,E,g) terminates 
is . 
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However, as we shall see from the experiments, the algorithm seems to prac- 
tically behave much more efficiently than indicated by Proposition 2. In fact, in 
most of the experiments we performed, we got an almost everage linear delay 
(in m) for generating a new point in \ iF*) U \ G*)- 



5 Experimental Results 

We performed a number of experiments to evaluate our implementation on ran- 
dom instances of the three monotone properties described in Section 2. The 
experiments were performed on a Pentium 4 processor with 2.2 GHz of speed 
and 512M bytes of memory. For each monotone property tt, we have limited the 
corresponding parameters defining the property to reasonable values such that 
the algorithm completes generation of the sets IF^- and Gir in reasonable time. 
Using larger values of the parameters increases the output size, resulting in large 
total time, although the time per output remains almost constant. For each case, 
the experiments were performed 5 times, and the numbers shown in the tables 
below represent averages. 

Tables 1 and 2 show our results for linear systems with n variables and r 
inequalities. Each element of the constraint matrix A and the right-hand side 
vector b is generated at random from 1 to 15. In the tables we show the output 
size, the total time taken to generate the output and the average time per each 
output vector. The parameter c denotes the maximum value that a variable can 
take. The last row of the table gives the ratio of the size of IF^^ to the size of 
Gtti for comparison with the worst case bound of (3). Note that this ratio is 
relatively close to 1, making joint generation an efficient method for generating 
both families iF^j and Gtvi ■ 



Table 1. Performance of the algorithm for property tti, where r = 5 and c — 2. 



n 


1 


1 20 


1 30 1 


40 


1 50 1 




^7T1 


^7T1 








^7T1 






>'7T1 




Output size (thousands) 


0.31 


0.19 


9.9 


5.7 


49.6 


q 

d 

(M 


127.3 


59.5 


1 195.3 


74.7 


Total Time (sec) 


4.7 


4.7 


297 


297 


1627 


1625 


5844 


5753 


o 

-T 

O 

CO 


10700 


Time/output, (msec) 


13 


24 


27 


62 


29 


78 


40 


103 


1 50 


133 


Ratio \Gn-i \/\F„-^ 1 


1 0.60 


0.57 


0.40 


0.47 


1 0.38 1 



Table 2. Performance of the algorithm for property tti, where n = 30 and c = 2. 



r 


1 5 1 


1 15 


1 25 


1 35 1 


45 1 
























Output size (thousands) 


20.4 


11.6 


68.6 


27.8 


122.7 


43.3 


196.6 


61.7 


317.5 


115.5 


Total Time (sec) 


408 


408 


2244 


2242 


6495 


6482 


15857 


15856 


30170 


30156 


Time/output, (msec) 


20 


50 


32 


90 


50 


158 


76 


258 


75 


260 


Ratio \g„^ \/\F„^ 1 


1 0.57 


1 0.41 


1 0.35 


1 0.31 1 


1 0.36 1 
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Tables 3 and 4 show the results for minimal infrequent/maximal frequent sets. 
In the tables, n, r and t denote respectively the number of columns, the number 
of rows of the matrix, and the threshold. Each row of the matrix was generated 
uniformly at random. As seen from Table 3, for t = 1,2, the bias between the 
numbers of maximal frequent sets and minimal infrequent sets, for the shown 
random examples, seem to be large. This makes joint generation an efficient 
method for generating minimal infrequent sets, but inefficient for generating 
maximal frequent sets for these examples. However, we observed that this bias 
in numbers decreases as the threshold t becomes larger. Table 4 illustrates this 
on a number of examples in which larger values of the threshold were used. 



Table 3. Performance of the algorithm for property tt 2 for threshold t — 1,2. 



n, r, t 


1 20,100,1 1 


1 30,100,1 1 


1 40,100,1 


1 30,100,2 1 


1 30,300,2 1 


1 30,500,2 1 


T 

7T9 
























Output size (thousands) 


2.9 


0.08 


51.7 


0.1 


337.1 


0.1 


75.4 


2.5 


386.7 


13.5 


718.3 


27.1 


Total Time (sec) 


60 


55 


1520 


769 


22820 


3413 


1962 


1942 


13269 


13214 


28824 


28737 


Time/output, (msec) 


20 


690 


30 


7742 


68 


34184 


28 


770 


33 


979 


40 


1062 


Ratio \Gtv 2 I 


1 0.0280 1 


1 0.0019 1 


1 0.0002 


1 0.0335 1 


1 0.0350 1 


0.0377 



Table 4. Performance of the algorithm for property 7T2 for large threshold values. 



n, r, t 


1 30,300,3 1 


1 30,300,5 


30,300,7 


1 30,300,9 1 


1 30,1000,20 


1 30,1000,25 1 


























Output size (thousands) 


403.3 


73.6 


362.6 


134.7 


269.0 


100.1 


199.1 


74.0 


491.3 


145.7 


398.1 


114.5 


Total Time (sec) 


7534 


7523 


6511 


6508 


4349 


4346 


3031 


3029 


13895 


13890 


9896 


9890 


Time/output, (msec) 


19 


102 


18 


48 


17 


43 


15 


41 


28 


95 


25 


86 


Ratio I^TTp |/|^7T9 1 


1 0.1826 1 


0.3715 


0.3719 


0.3716 


1 0.2965 


1 0.2877 1 



Figures 1 and 2 show how the output rate changes for minimal fea- 
sible/maximal infeasible solutions of linear systems and for minimal infre- 
quent/maximal frequent sets, respectively. For minimal feasible solutions, we 
can see that the output rate changes almost linearly as the number of outputs 
increases. This is not the case for the maximal infeasible solutions, where the 
algorithm efficiecy decreases (the generation problem for maximal infeasible so- 
lutions is NP-hard). For minimal infrequent and maximal frequent sets. Figure 2 
shows that the output rate increases very slowly. This illustrates somehow that 
the algorithm practically behaves much better than the quasi-polynomial bound 
stated in Proposition 2. 

Table 5 shows the results for maximal sparse /minimal non-sparse boxes with 
dimension n, for a set of r random points, threshold t, and upper bound c on the 
coordinate of each point. As in the case of frequent sets, the bias between the 
numbers and is large for t = 0 but seems to decrease with larger values 
of the threshold. In fact, the table shows two examples in which the number of 
minimal non-sparse boxes is larger than the number of maximal sparse boxes. 
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Fig. 1. Average time per output, as a 
function of the number of outputs for 
minimal feasible/maximal infeasible so- 
lutions of linear systems, with c = 5 and 
(n,r) = (30, 100), (50, 300). 




Number of outputs <in thousands) 



Fig. 2. Average time per output, as a 
function of the number of outputs for 
minimal infrequent / maximal frequent 
sets, with n — 30, r = 1000 and t = 
5,10. 



We are not aware of any implementation of an algorithm for generating maximal 
sparse boxes except for [13] which presents some experiments for n = 2 and t = 0. 
Experiments in [13] indicated that the algorithm suggested there is almost linear 
in the the number of points r. Figure 3 illustrates a similar behaviour exhibited 
by our algorithm. In the figure, we show the total time required to generate all 
the 2-dimensional maximal empty boxes, as the number of points is increased 
from 10,000 to 60,000, for two different values of the upper bound c. 



Table 5. Performance of the algorithm for property tts with n — 7 and upper bound 
c = 5. 



r, t 


1 100,0 


1 300,0 1 


1 500,0 1 


1 300,2 1 


1 300,6 


1 300,10 1 


^,3 




^,3 




^.3 




^,3 




^.3 




^,3 




Output size (thousands) 


16.1 


0.1 


49.1 


0.3 


72.9 


0.5 


228.5 


88.7 


373.4 


466.3 


330.4 


456.5 


Total Time (sec) 


932 


623 


2658 


1456 


3924 


2933 


8731 


8724 


17408 


17404 


16156 


16156 


Time/output, (msec) 


29 


6237 


27 


4866 


27 


5889 


19 


98 


23 


37 


24 


35 


Ratio l/l-^TT^ 1 


1 0.0062 


1 0.0061 1 


1 0.0068 1 


1 0.3881 1 


1 1.2488 


1 1.3818 1 



As mentioned in Section 4, it is possible in general to implement the pro- 
cedures To!mj^{z) and maxg(z) using the coordinate decent method, but more 
efficient implementations can be obtained if we specialize these procedures to 
the monotone property under consideration. Figure 4 compares the two different 
implementations for the property 7T3. Clearly, the gain in performance increases 
as the upper bound c increases. 



Let us finally point out that we have observed that the algorithm tends to run 
more efficiently when the sets and become closer in size. This observation 
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Fig. 3. Total generation time as a fnnc- Fig. 4. Comparing general versus spe- 
tion of the number of points r for max- cialized minimization for property tts. 

imal boxes with n = 2, t = 0, and Each plot shows the average CPU 

c = 100, 1000. time/maximal empty box generated ver- 

sus the upper bound c, for n — 5 and 
r = 500. 

is illustrated in Figure 5 which plots the average time per output (i.e. total 
time to output all the elements of UG-k divided by \Tt^ UGttD versus the ratio 
|5^|/|.F7r|. This indicates that, when the elements of the sets and are more 
uniformly distributed along the space, it becomes easier for the joint generation 
algorithm to find a new vector not in the already generated sets C and 

G* c G^. 




Ratio |F|/|G| 



Fig. 5. Average generation time as a func- 
tion of the ratio |t/ 7 r|/|.F,r|, for properties 7T2 
and 7T3. 
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6 Conclusion 

We have presented an efficient implementation for a quasi-polynomial algorithm 
for jointly generating the families and of minimal satisfying and maximal 
non-satisfying vectors for a given monotone property tt. We provided experi- 
mental evaluation of the algorithm on three different monotone properties. Our 
experiments indicate that the algorithm behaves much more efficiently than its 
worst-case time complexity indicates. The algorithm seems to run faster on in- 
stances where the families and are not very biased in size. Finally, our 
experiments also indicate that such non-bias in size is not a rare situation (for 
random instances), despite the fact that inequalities of the form (3)-(5) may hold 
in general. 
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Abstract. We present a simple pedagogical graph theoretical descrip- 
tion of Lempel, Even, and Cederbaum (LEC) planarity method based on 
concepts due to Thomas. A linear-time implementation of LEC method 
using the PC-tree data structure of Shih and Hsu is provided and de- 
scribed in details. We report on an experimental study involving this 
implementation and other available linear-time implementations of pla- 
narity algorithms. 



1 Introduction 

The first linear-time planarity testing algorithm is due to Hopcroft and Tar- 
jan [10]. Their algorithm is an ingenious implementation of the method of Aus- 
lander and Parter [1] and Goldstein [9]. Some notes to the algorithm were made 
by Deo [7], and significant additional details were presented by Williamson [20, 
21] and Reingold, Nievergelt, and Deo [16]. 

The second method of planarity testing proven to achieve linear time is due 
to Lempel, Even, and Cederbaum (LEC) [13]. This method was optimized to 
linear time thanks to the st-numbering algorithm of Even and Tarjan [8] and 
the PQ-tree data structure of Booth and Lueker (BL) [2]. Chiba, Nishizeki, Abe, 
and Ozawa [6] augmented the PQ-tree operations so that a planar embedding is 
also computed in linear time. 

All these algorithms are widely regarded as being quite complex [6,12,18]. 
Recent research efforts have resulted in simpler linear-time algorithms proposed 
by Shih and Hsu (SH) [11,18,17] and by Boyer and Myrvold (BM) [4,5]. These 
algorithms implement LEC method and present similar and very interesting 
ideas. Each algorithm uses its own data structure to efficiently maintain relevant 
information on the (planar) already examined portion of the graph. 

The description of SH algorithm made by Thomas [19] provided us with the 
key concepts to give a simple graph theoretical description of LEC method. This 
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description increases the understanding of BL, SH, and BM algorithms, all based 
on LEG method. 

Section 2 contains definitions of the key ingredients used by LEG method. 
In Section 3, an auxiliary algorithm is considered. LEG method is presented in 
Section 4 and an implementation of SH algorithm is described in Section 5. This 
implementation is available at http://www.ime.usp.br/~coelho/sh/ and, as 
far as we know, is the unique available implementation of SH algorithm, even 
though the algorithm was proposed about 10 years ago. Finally, Section 6 reports 
on an experimental study. 

2 Frames, XY-Paths, XX-Obstructions and Planarity 

This section contains the definitions of some concepts introduced by Thomas [19] 
in his presentation of SH algorithm. We use these concepts in the coming sections 
to present both LEG method and our implementation of SH algorithm. 

Let H he & planar graph. A subgraph F of i/ is a frame of H if F is induced 
by the edges incident to the external face of a planar embedding of H (Figs. 1(a) 
and 1(b)). 




Fig. 1. (a) A graph H. (b) A frame of H. (c) A path P in a frame, (d) The complement 
of P. 



If G is a connected graph, i/ is a planar induced subgraph of G and F is a 
frame of FI, then we say that F is a frame of F[ in G if it contains all vertices 
of F[ that have a neighbor in Vg \ Vh- Neither every planar induced subgraph 
of a graph G has a frame in G (Fig. 2(a)) nor every induced subgraph of a 
planar graph G has a frame in G (Fig. 2(b)). The connection between frames 
and planarity is given by the following lemma. 

Lemma 1 (Thomas [19]). If F[ is an induced subgraph of a planar graph G 
such that G — Vh is connected, then H has a frame in G. ■ 

Let F be a frame of H and F be a path in F. The basis of P is the sub- 
graph of F formed by all blocks of F which contain at least one edge of P. Let 
Gi, G 2 , . . . , Gfc be the blocks in the basis of P. For i = 1, 2, . . . , fc, let Pi := PfiGi 
and, if Gi is a cycle, let Pi := Gi \ Pi, otherwise let Pi := Pi. The complement 
of P in F is the path Fi U A U . . . U Pk, which is denoted by P. If Ep = 0 
then P := P (Figs. 1(c) and 1(d)). 
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Fig. 2. (a) Subgraphs of ft's , 3 and induced by the solid edges have no frames, 

(b) Snbgraph indnced by the solid edges has no frame in the graph. 



Let W he a set of vertices in H and Z be a set of edges in iJ. A vertex v 

in iJ sees W through Z if there is a path in H from z; to a vertex in W with all 

edges in Z. Let X and Y be sets of vertices of a frame F of H. A path P in F 
with basis S is an XY -path (Fig. 3) if 

(pi) the endpoints of P are in X] 

(p2) each vertex of S that sees X through Ef\Es is in P; 

(p3) each vertex of S that sees Y through Ep \ Es is in P; 

(p4) no component of A — Vs contains vertices both in X and in Y . 




Fig. 3. In (a), (b), (c), and (d), let P denote the thick path; its basis is shadowed, 
(a) P is not an AF-path since it violates (p3). (b) P is an AF-path. (c) P is not an 
AF-path since it violates (p2). (d) P is not an AF-path since it violates (p4). 
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There are three types of objects that obstruct an XY-pa,th. to exist. They 
are called XY -obstructions and are defined as 

(01) a 5-tuple (C, ui, U2, U3, V4) where C is a cycle of F and vi, V2, V3, and V4 are 
distinct vertices in C that appear in this order in C, such that V\ and U3 
see X through Ep\ Eq and V2 and V4 see Y through Ep\ Ec] 

(02) a 4-tuple {C,vi,V2,vz) where C is a cycle of F and vi, V2, and V3 are 
distinct vertices in C that see X and Y through Ep\ Eq', 

(03) a 4-tuple (v,Ki,K2,K3) where v £ Vp and Ki, K2, and K3 are distinct 
components of E — v such that Ki contains vertices in X and in Y . 

The existence of an XT-obstruction is related to non-planarity as follows. 

Lemma 2 (Thomas [ 19 ]). Let H he a planar connected subgraph of a graph G 
and w be a vertex in Vq \ Vh such that G — Vh and G — {Vr U {w}) is connected. 
Let E he a frame of H in G, let X he the set of neighbors of w in Vp and let Y 
be the set of neighbors of Vq \ {Vr U {w}) in Vp. Lf E has an XY -obstruction 
then G has a subdivision of K3 3 or . 

Sketch of the proof: An AT-obstruction of type (ol) or (o3) indicates a 1^3,3- 
subdivision. An AT-obstruction of type (o2) indicates either a As-subdivision 
or a A3_3-subdivision (Fig. 4). ■ 



3 Finding Xl^-Paths or XY-Obstrnctions 

Let A be a connected frame and let A and Y be subsets of Vp. If F’ is a tree, 
then finding an AV-path or an A V-obstruction is an easy task. The following 
algorithm finds either an AV-path or an A V-obstruction in F manipulating a 
tree that represents F. 

Let B be the set of blocks of a connected graph H and let T be the tree 
with vertex set B\JVr and edges of the form Bv where B £ B and v £ Vr. We 
call T the block tree of H (Fig. 5) (the leaves in Vr make the definition slightly 
different than the usual). Each node of T in is said a C-node and each node 
of T in Vr is said a P-node. 



Algorithm Central(E, A, V). Receives a connected frame F and sub- 
sets A and V of Vp and returns either an AV-path or an A V-obstruction 
in F. 

Let To be the block tree of F. The algorithm is iterative and each iteration 
begins with a subtree T of Tq, subsets Ap and Yp of Vp and subsets W and Z of 
Vp. The sets Xp and Yp are formed by the nodes of T that see A and V through 
Epg \ Ep, respectively. The sets W and Z contain the P-nodes of Tq that see 
A and V through Ep^ \ Ep, respectively. At the beginning of the first iteration, 
T = Tq, Xp = X, Yp = Y, W = X, and Z = Y. Each iteration consists of the 
following: 
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Fig. 4. Some sitnations with an XF-obstruction (C, ui, V2, V3) of type (o2). The dashed 
lines indicate paths. In UiVi-path, Ui is the only vertex in [Vh \ {Vh U { w })), for 
i = 1,2,3. (a) Subdivision of K 3 coming from an XF-obstruction. (b) Subdivisions 
of Jfs.s coming from an XF-obstruction. (c) Concrete example of an XF-obstruction 
leading to a Xs^a-subdivision. 





Fig. 5. A graph and its block tree. 



Case 1: Each leaf of T is in Xt H Yt and T is a path. 

Let R be the set of P-nodes of T. 

For each C-node C of T, let Xc := Vc n (VF U R) and Yc := Vc n (Z U i?). 

Case lA: Each C-node C of T has a path Pc containing Xq and internally 
disjoint from Yc- 
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Let Pt be the path in F obtained by the concatenation of the paths in 
{Pc : C is a C-node of T}. 

Let P be a path in F with endpoints in X, containing Pt and containing 
VcC\W for each block C in the basis of P. 

Return P and stop. 

Case IB: There exists a C-node C of T such that no path containing Xc 
is internally disjoint from Yq. 

Let Vi,V2,V3, and V4 be distinct vertices of C appearing in this order 
in C, such that v\ and V3 are in Xc and V2 and V4 are in Yc- 
Return (C,Vi,V2,V3,V4) and stop. 

Case 2: Each leaf of T is in XtC\Yt and there exists a node n of T with degree 
greater than 2. 

Case 2A: n is a C-node. 

Let C be the block of F corresponding to v. 

Let vi,V2i and V3 be distinct P-nodes adjacent to v in T. 

Return {C,V\,V2,V3) and stop. 

Case 2B: n is a P-node. 

Let Ci,C2, and C3 be distinct C-nodes adjacent to v in T. 

Let Ki,K2, and K3 be components of F — v such that Ci is a block of 
K, + v {i= 1 , 2 , 3 ). 

Return {v,Ki,K2,K3) and stop. 

Case 3: There exists a leaf / of T not in Xt H Yt- 
Let u be the node of T adjacent to /. 

Let T' -.= T - /. 

Let Xt' := {Xt \ {/}) U {m} if / is in Xt] otherwise Xt> '■= Xt- 
Let Yt’ ■= {Yt \ {/}) U {u} if / is in Yt', otherwise 1^' := Yt- 
Let W := W U {u} if / is in Xt and m is a P-node; otherwise W := W. 
Let Z' := Z U {u} if / is in Yt and t6 is a P-node; otherwise Z' := Z. 

Start a new iteration with T' , Xt' ,Yt> ,W , and Z' in the roles of 
T, Xt, Yt, W, and Z, respectively. 



The execution of the algorithm consists of a sequence of “reductions” made by 
Case 3 followed by an occurrence of either Case 1 or Case 2. At the beginning of 
the last iteration, the leaves of T are called terminals. The concept of a terminal 
node is used in a fundamental way by SH algorithm. The following theorem 
follows from the correctness of the algorithm. 

Theorem 1 (Thomas [19]). If F is a frame of a connected graph and X andY 
are subsets of Vp, then either there exists an XY-path or an XY -obstruction 
in F. ■ 
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4 LEG Planarity Testing Method 



One of the ingredients of LEG method is a certain ordering vi,V 2 , ■ ■ ■ ,Vn of the 
vertices of the given graph G such that, for i = 1, ... ,n, the induced subgraphs 
G[{vi, . . . , Vi}] and G[{vi+i , . . . , t;„}] are connected. Equivalently, G is connected 
and, for i = 2, ... ,n — 1, vertex Vi is adjacent to Vj and Vk for some j and k 
such that l<j<i<k<n. A numbering of the vertices according to such an 
ordering is called a LEC-numbering of G. If the ordering is such that viVn is an 
edge of the graph, the numbering is called an st-numhering [13]. One can show 
that every biconnected graph has a LEC-numbering. 

LEG method examines the vertices of a given biconnected graph, one by one, 
according to a LEC-numbering. In each iteration, the method tries to extend a 
frame of the subgraph induced by the already examined vertices. If this is not 
possible, the method declares the graph is non-planar and stops. 




Fig. 6. (a) A frame F and an XF-path P in thick edges, (b) F after moving the 

elements of X to one side and the elements of Y to the other side of P. Squares mark 
vertices in Vf \ Vp that do not see Y\Vp through Ef \ Es , where P is the complement 
and S is the basis of P. (c) F together with the edges with one endpoint in F and the 
other in w. (d) A frame oi H -\- w. 
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Method LEC(G). Receives a biconnected graph G and returns yes if G 
is planar, and NO otherwise. 

Number the vertices of G according to a LEC-numbering. Each iteration 
starts with an induced subgraph H of G and a frame F of H in G. At the 
beginning of the first iteration, FI and F are empty. Each iteration consists of 
the following: 

Case 1: Ff = G. 

Return yes e stop. 

Case 2: H G. 

Let w be the smallest numbered vertex in G — Vh- 
Let X := {u € Vf ■ uw G Eq}- 

Let Y := {u '■ there exists v &Vg \ {Vh U {w}) such that uv G Eq}- 
Case 2 A: There exists an AF-obstruction in E. 

Return NO and stop. 

Case 2B: There exists an AF-path P in F. 

Let P := {wq, Wi, . . . ,Wk) be the complement of P and let S be the basis 
of P. 

Let R be the set of vertices in Vp \ Vp that do not see Y \Vp through 
Ep \ Es (Figs. 6(a) and 6(b)). 

Let F' be the graph resulting from the addition of w and the edges wwq 
and wwk to the graph F — R (Fig. 6(c)). 

Let H' ■=H + w (Fig. 6(d)). 

Start a new iteration with FI' and F' in the roles of FI and F respectively. 



The following invariants hold during the execution of the method. 

(led) P[ and G —Vh are connected graphs; 

(lec2) F is a frame of FI in G. 

These invariants together with Lemmas 1 and 2 and Theorem 1 imply the cor- 
rectness of the method and the following classical theorem. 

Theorem 2 (Kuratowski). A graph is planar if and only if it has no subdivi- 
sion of ^ or K^. ■ 

Three of the algorithms mentioned in the introduction are very clever linear- 
time implementations of LEG method. BL use an st- numbering instead of an 
arbitrary LEC-numbering of the vertices and use a PQ-tree to store F. SH use 
a DFS-numbering and a PC-tree to store F. BM also use a DFS-numbering and 
use still another data structure to store F. One can use the previous description 
easily to design a quadratic implementation of LEG method. 
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5 Implementation of SH Algorithm 

SH algorithm, as all other linear-time planarity algorithms, is quite complex to 
implement. The goal of this section is to share our experience in implementing it. 

Let G be a connected graph. A DFS-numbering is a numbering of the vertices 
of G obtained from searching a DFS-tree of G in post-order. SH algorithm uses 
a DFS-numbering instead of a LEC-numbering. If the vertices of G are ordered 
according to a DFS-numbering, then the graph G[{t-|- 1, . . . , n}] is connected, for 
i = 1, . . . , n. As a DFS-numbering does not guarantee that H := G[{1, . . . , i — 1}] 
is connected, if there exists a frame F of H and FI is not connected, then F is also 
not connected. Besides, to compute (if it exists) a frame of F[ + i, it is necessary 
to compute an AE-path for each component of F that contains a neighbor of i. 

Let V he & vertex of F and G be a block of F containing v and, if possible, 
a higher numbered vertex. We say v is active if v sees X UY through Ep\ Ec- 

PC-Tree 

The data structure proposed by SH to store F is called a PC-tree and is here 
denoted by T. Conceptually, a PC-tree is an arborescence representing the rel- 
evant information of the block forest of F. It consists of P-nodes and C-nodes. 
There is a P-node for each active vertex of F and a C-node for each cycle of F. 
We refer to a P-node by the corresponding vertex of F . There is an arc from a 
P-node M to a P-node v in T if and only if uv is a block of F. Each C-node c has 
a circular list, denoted RBC{c), with all P-nodes in its corresponding cycle of F, 
in the order they appear in this cycle. This list starts by the largest numbered 
P-node in it, which is called its head. The head of the list has a pointer to c. Each 
P-node appears in at most one RBC in a non-head cell. It might appear in the 
head cell of several RBCs. Each P-node v has a pointer nonhead-RBC-cell(v) 
to the non-head cell in which it appears in an RBC. This pointer is null if 
there is no such cell. The name RBC extends for representative bounding eycle 
(Figs. 7(a)-(c)). 

Let T' be the rooted forest whose node set coincides with the node set of T 
and the arc set is defined as follows. Every arc of T is an arc of T' . Besides these 
arcs, there are some virtual arcs: for every C-node c, there is an arc in T' from 
c to the P-node which is the head of RBC{c) and there is an arc to c from all 
the other P-nodes in RBC{c) (Fig. 7(d)). In the exposition ahead, we use on 
nodes of T concepts such as parent, child, leaf, ancestral, descendant and so on. 
By these, we mean their counterparts in T' . 

Forest T' is not really kept by the implementation. However, during each 
iteration, some of the virtual arcs are determined and temporarily stored to 
avoid traversing parts of the PC-tree more than once. So, each non-head cell 
in an RBC and each C-node has a pointer to keep its virtual arc, when it is 
determined. The pointer is null while the virtual arc is not known. 

Values h{u) and b{v) 

For each vertex u of G, denote by h{u) the largest numbered neighbor of u in G. 
This value can be computed together with a DFS-numbering, and can be stored 
in an array at the beginning of the algorithm. 
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Fig. 7. (a) A graph G, a DFS-numbering of its vertices and, in thick edges, a frame 
F of G[1 . . 11] in G. (b) Black vertices in frame F are inactive, (c) The PC-tree T for 
F, with RBCs indicated in dotted, (d) Rooted tree T' corresponding to T; virtual arcs 
are dashed. 



For each node v of T, let b{v) := max{ft,(tt) : m is a descendant of u in T}. For a 
C-node of T, this number does not change during the execution of the algorithm. 
On the other hand, for a P-node of T, this number might decrease because its set 
of descendants might shrink when T is modified. So, in the implementation, the 
value of b{c) for a C-node c is computed and stored when c is created. It is the 
maximum over b{u) for all u in the path in T corresponding to the XY-paih in 
F that originated c. One way to keep b{v) for a P-node v is, at the beginning of 
the algorithm, to build an adjacency list for G sorted by the values of h, and to 
keep, during the algorithm, for each P-node of T, a pointer to the last traversed 
vertex in its sorted adjacency list. Each time the algorithm needs to access b{v) 
for a P-node v, it moves this pointer ahead on the adjacency list (if necessary) 
until (1) it reaches a vertex u which has v as its parent, in which case b{v) is the 
maximum between h(v) and b{u), or (2) it reaches the end of the list, in which 
case b{v) = h{v). 

Traversal of the PC-tree 

The traversal of the PC-tree T, inspired by Boyer and Myrvold [4,5], is done 
as follows. To go from a P-node u to a node v which is an ancestral of u in T, 
one starts with x = u and repeats the following procedure until a; = w. If a; is a 
P-node and nonhead-RBC-cell{x) is null, move x up to its parent. If a; is a P- 
node and nonhead-RBC -cell{x) is non-NULL, either its virtual arc is null or not. 
If it is non-NULL, move x to the C-node pointed by the virtual arc. Otherwise, 
starting at nonhead-RBC -cell{x) , search the RBC in an arbitrary direction until 
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either (1) the head of the RBC is reached or (2) a cell in the RBC with its virtual 
arc non-NULL is reached or (3) a P-node y such that b{y) > w is reached. If (3) 
happens before (1), search the RBC, restarting at nonhead-RBC-cell{x), but in 
the other direction, until either (1) or (2) happens. If (I) happens, move x to 
the C-node pointed by the head. If (2) happens, move x to the C-node pointed 
by the virtual arc. In any case, search all visited cells in the RBC again, setting 
their virtual arcs to x. Also, set the virtual arc from x to the head of its RBC. 

In a series of moments, the implementation traverses parts of T. For each 
node of T, there is a mark to tell whether it was already visited in this iteration 
or not. By visited, we mean a node which was assigned to x in the traversal 
process described above. Every time a new node is included in T, it is marked as 
unvisited. Also, during each phase of the algorithm where nodes are marked as 
visited, the algorithm stacks each visited node and, at the end of the phase, un- 
stacks them all, undoing the marks. This way, at the beginning of each iteration, 
all nodes of T are marked as unvisited. 

The same trick with a stack is done to unset the virtual arcs. When a virtual 
arc for a node v is set in the traversal, v is included in a second stack and, at 
the end of the iteration, this stack is emptied and all corresponding virtual arcs 
are set back to null. 

Terminals 

The next concept, introduced by SH, is the key on how to search efficiently for 
an AT-obstruction. A node t of T is a terminal if 

(tl) b{t) > w, 

(t2) t has a descendant in T that is a neighbor of w in G; 

(t3) no proper descendant of t satisfies properties (tl) and (t2) simultaneously. 

Because of the orientation of the PC-tree, one of the terminals from Section 4 
might not be a terminal here. This happens when one of the terminals from 
Section 4 is an ancestor in the PC-tree of all others. An extra effort in the 
implementation is necessary to detect and deal with this possible extra terminal. 

The first phase of an iteration of the implementation is the search for the 
terminals. This phase consists of, for each neighbor v of w such that v < w, 
traversing T starting at v until a visited node ^ is met. (Mark all nodes visited 
in the traversal; this will be left implicit from now on.) On the way, if a node 
u such that b{u) > w is seen, mark the first such node as a candidate-terminal 
and, if z is marked as such, unmark it. The result from this phase is the list of 
terminals for each component of F. 

Search for AT- Obstructions 

The second phase is the search for an AF-obstruction. First, if there are three 
or more terminals for some component of F, then there is an AF-obstruction of 
type either (o2) or (o3) in F (Case 2 of Central algorithm). We omit the details 
on how to effectively find it because this is a terminal case of the algorithm. 
Second, if there are at most two terminals for each component of F, then, for 
each component of F with at least one terminal, do the following. If it has two 
terminals, call them t\ and t 2 - If it has only one terminal, call it t\ and let t 2 
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be the highest numbered vertex in this component. Test each C-node c on the 
path in T between ti and t 2 for an XT-obstruction of type (ol) (Case IB of 
Central algorithm). The test decides if the cycle in F corresponding to c plays 
or not the role of C in (ol). Besides these tests, the implementation performs 
one more test in the case of two terminals. The least common ancestor m of ti 
and t 2 in T is tested for an XF-obstruction of type (o2), if m is a C-node, or an 
XF-obstruction of type (o3), if m is a P-node. This extra test arises from the 
possible undetected terminal. 

To perform each of these tests, the implementation keeps one more piece 
of information for each C-node c. Namely, it computes, in each iteration, the 
number of P-nodes in RBC{c) that see X through Ep\Ec, where C is the cycle 
in F corresponding to c. This number is computed in the first phase. Each C- 
node has a counter that, at the beginning of each iteration, values 1 (to account 
for the head of its RBC). During the first phase, every time an RBC is entered 
through a P-node which was unvisited, the counter of the corresponding C-node 
is incremented by 1. As a result, at the end of the first phase, each (relevant) 
C-node knows its number. 

For the test of a C-node c, the implementation searches RBC{c), starting 
at the head of RBC{c). It moves in an arbitrary direction, stopping only when 
it finds a P-node u (distinct from the head) such that b{u) > w. On the way, 
the implementation counts the number of P-nodes traversed. If only one step is 
given, it starts again at the head of RBC{c) and moves to the other direction 
until it finds a P-node u such that b{u) > w, counting the P-nodes, as before. 
If the counter obtained matches the number computed for that C-node in the 
first phase, it passed the test, otherwise, except for two cases, there in an XY- 
obstruction of type (ol). The first of the two cases missing happens when there 
are exactly two terminals and c is the lower common ancestor of them. The 
second of the two cases happens when there is exactly one terminal and c is 
(potentially) the upper block in which the AF-path ends. The test required 
in these two cases is slightly different, but similar, and might give raise to an 
AF-obstruction of type (ol) or (o2). We omit the details. 

PC-Tree update 

The last phase refers to Case 2B in LEC method. It consists of the modification 
of T according to the new frame. First, one has to add to T a P-node for w. 
Then, parts of T referring to a component with no neighbor of w remain the 
same. Parts of T referring to a component with exactly one neighbor of w are 
easily adjusted. So we concentrate on the parts of T referring to components 
with two or more neighbors of w. Each of these originates a new C-node. For 
each of them, the second phase determined the basis of an AF-path, which is 
given by a path Q in T. Path Q consists basically of the nodes visited during 
the second phase. Let us describe the process in the case where there is only one 
terminal. The case of two terminals is basically a double application of this one. 

Call c the new C-node being created. Start RBC{c) with its head cell, which 
refers to w, and points back to c. Traverse Q once again, going up in T. For 
each P-node m in Q such that nonheaFRBC-cell{u) is null, if b{u) > w (here 
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we refer to the possibly new value of b{u), as u might have lost a child in the 
traversal), then an RBC cell is created, referring to u. It is included in RBC{c) 
and nonhead-RBC-cell{u) is set to point to it. For each P-node u such that 
nonhead-RBC-cell{u) is non-NULL, let c' be its parent in T. Concatenate to 
RBC{c) a part of RBC{c'), namely, the part of RBC{c') that was not used to 
get to c' in any traversal in the second phase. To be able to concatenate without 
traversing this part, one can use a simple data structure proposed by Boyer and 
Myrvold [5,4] to keep a doubled linked list. (The data structure consists of the 
cells with two indistinct pointers, one for each direction. To move in a certain 
direction, one starts making the first move in that direction, then, to keep moving 
in the same direction, it is enough to choose always the pointer that does not 
lead back to the previous cell.) 

During the traversal of Q, one can compute the value of 6(c). Its value is 
simply the maximum of b{u) over all node u traversed. This completes the de- 
scription of the core of the implementation. 

Certificate 

To be able to produce a certificate for its answer, the implementation carries 
still more information. Namely, it carries the DFS-tree that originated the DFS- 
numbering of the vertices and, for each C-node, a combinatorial description 
of a planar embedding of the corresponding biconnected component where the 
P-nodes in its RBC appear all on the boundary of the same face. We omit the 
details, but one can find at http://www. ime.usp.br/~coelho/sh/ the complete 
implementation, that also certificates its answer. 

6 Experimental Study 

The main purpose of this study was to confirm the linear-time behavior of our 
implementation and to acquire a deeper understanding of SH algorithm. Boyer 
et al. [3] made a similar experimental study that does not include SH algorithm. 

The LEDA platform has a planarity library that includes implementations 
of Hopcroft and Tarjan’s (HT) and BL algorithms and an experimental study 
comparing them. The library includes the following planar graph generator rou- 
tines: maximal_planar_map and rauidom.planarjnap. Neither of them generates 
plane maps according to the uniform distribution [14], but they are well-known 
and widely used. The following classes of graphs obtained through these routines 
are used in the LEDA experimental study: 

(Gl) random planar graphs; 

(G2) graphs with a six vertices from a random planar graph are randomly 
chosen and edges among them are added to form a ATs 3; 

(G3) graphs with a K^: five random vertices from a random planar graph are 
chosen and all edges among them are added to form a ATs; 

(G4) random max;imal planar graphs; 

(G5) random maximal planar graphs plus a random edge connecting two non- 
adjacent vertices. 
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Our experimental study extends the one presented in LEDA including our 
implementation of SH algorithm made on the LEDA platform and an imple- 
mentation of BM algorithm developed in C. We performed all empirical tests 
used in LEDA to compare HT and BL implementations [15]. The experimen- 
tal environment was a PC running GNU/Linux (RedHat 7.1) on a Celeron 
700MHz with 256MB of RAM. The compiler was the gcc 2.96 with options 
-DLEDA_CHECKING_OFF -0. 

In the experiments [15, p. 123], BL performs the planarity test 4 to 5 times 
faster than our SH implementation in all five classes of graphs above. For the 
planar classes (Cl) and (G4), it runs 10 times faster than our SH to do the 
planarity test and build the embedding. On (G2) and (G3), it is worse than our 
SH, requiring 10% to 20% more time for testing and finding an obstruction. On 
(G5), it runs within 65% of our SH time for testing and finding an obstruction. 
For the planarity test only, HT runs within 70% of our SH time for the planar 
classes (Gl) and (G4), but performs slightly worse than our SH on (G2) and 
(G3). On (G5), it outperforms our SH, running in 40% of its time. For the 
planar classes (Gl) and (G4), HT is around 4 times faster when testing and 
building the embedding. (The HT implementation in question has no option to 
produce an obstruction when the input graph is non-planar; indeed, there is no 
linear-time implementation known for finding the obstruction for it [22].) BM 
performs better than all, but, remember, it is the only one implemented in C 
and not in the LEDA platform. It runs in around 4% of the time spent by our 
SH for testing and building the embedding and, for the non-planar classes, when 
building the obstruction, it runs in about 15% of our SH time on (G2) and (G3) 
and in about 10% of our SH time on (G5). (There is no implementation of BM 
available that only does the planarity testing.) The time execution used on these 
comparisons is the average CPU time on a set of 10 graphs from each class. 

Figure 8 shows the average CPU time of each implementation on (a) (Gl) for 
only testing planarity (against BM with testing and building an embedding, as 
there is no testing only available), (b) (G2) for testing and finding an obstruction 
(HT is not included in this table, by the reason mentioned above), (c) (G4) for 
testing and building an embedding, and (d) for testing and finding an obstruction 
(again, HT excluded). 

We believe the results discussed above and shown in the table are initial and 
still not conclusive because our implementation is yet a prototype. (Also, in our 
opinion, it is not fair to compare LEDA implementations with C implementa- 
tions.) 

Our current understanding of SH algorithm makes us believe that we can 
design a new implementation which would run considerably faster. Our belief 
comes, first, from the fact that our current code was developed to solve the 
planarity testing only, and was later on modified to also produce a certificate 
for its answer to the planarity test. Building an implementation from the start 
thinking about the test and the certificate would be the right way, we believe, to 
have a more efficient code. Second, during the adaptation to build the certificate 
(specially the embedding when the input is planar) made us notice several details 
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Fig. 8. Empirical resnlts comparing SH, HT, BL, and BM implementations. 



in the way the implementation of the test was done that could be improved. 
Even though, we decide to go forward with the implementation of the complete 
algorithm (test plus certificate) so that we could understand it completely before 
rewriting it from scratch. The description made on Section 5 already incorporates 
some of the simplifications we thought of for our new implementation. It is our 
intention to reimplement SH algorithm from scratch. 
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Abstract. In this paper we present a new fast approximation algorithm 
for the Uniform Metric Labeling Problem. This is an important classi- 
hcation problem that occur in many applications which consider the 
assignment of objects into labels, in a way that is consistent with some 
observed data that includes the relationship between the objects. 

The known approximation algorithms are based on solutions of large 
linear programs and are impractical for moderated and large size in- 
stances. We present an 8 log n-approximation algorithm analyzed by a 
primal-dual technique which, although has factor greater than the pre- 
vious algorithms, can be applied to large sized instances. We obtained 
experimental results on computational generated and image processing 
instances with the new algorithm and two others LP-based approxima- 
tion algorithms. For these instances our algorithm present a considerable 
gain of computational time and the error ratio, when possible to com- 
pare, was less than 2% from the optimum. 



1 Introduction 

In a traditional classification problem, we wish to assign each of n objects to one 
of m labels (or classes). This assignment must be consistent with some observed 
data that includes pairwise relationships among the objects to be classified. 
More precisely, the classification problem can be defined as follows: Let P be 
a set of objects, L a set of labels, w : P x P ^ K+ a weight function, d : 
L X L ^ K+ a distance function and c : P x L ^ K+ an assignment cost 
function. A labeling of P over L is a function 4> : P ^ L. The assignment cost 
of a labeling (p is the sum '^i^pc{i,4>{i)) and the separation cost of a labeling 
(j) is the sum j^pd{(p{i),(j){j))w{i,j). The function w indicates the strength 
of the relation between two objects, and the function d indicates the similarity 
between two labels. The cost of a labeling p is the sum of the assignment cost 
and the separation cost. The Metric Labeling Problem (MLP) consists of finding 

* This work has been partially supported by MCT/CNPq Project ProNEx grant 
664107/97-4, FAPESP grants 01/12166-3, 02/05715-3, and CNPq grants 300301/98- 
7, 470608/01-3, 464114/00-4, and 478818/03-3. 
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a labeling of the objects into labels with minimum total cost. Throughout this 
paper, we denote the size of the sets P and L by n and m, respectively. 

In this paper we consider the Uniform Labeling Problem (ULP), a special 
case of MLP, where the distance d{i,j) has a constant value if i yf j, and 0 
otherwise, for all i,j G L. 

The MLP has several applications, many listed by Kleinberg and Tardos 
[12]. Some applications occur in image processing [6,2,8], biometrics [1], text 
categorization [3], etc. An example of application in image processing is the 
restoration of images degenerated by noise. In this case, an image can be seen as 
a grid of pixels, each pixel is an object that must be classified with a color. The 
assignment cost is given by the similarity between the new and old coloring, and 
the separation cost is given by the color of a pixel and the color of its neighbors. 

The uniform and the metric labeling problems generalizes the Multiway Cut 
Problem, a known NP-hard problem [7]. These labeling problems were first 
introduced by Kleinberg and Tardos [12] that present an O(logmloglogm)- 
approximation algorithm for the MLP, and a 2-approximation algorithm for the 
ULP using the probabilistic rounding technique over a solution of a linear pro- 
gram. The associated LP has O(n^m) constraints and O(n^m) variables. 

Chekuri et al. [4] developed a new linear programming formulation that is 
better for the non-uniform case. However, for the uniform case it maintains a 2- 
approximation factor and has bigger time complexity. This time complexity is a 
consequence of solving a linear program with constraints and 0{n^wf) 

variables. 

Gupta and Tardos [10] present a formulation for the Truncated Labeling 
Problem, a case of the MLP where the labels are positive integers and the 
metric distance between labels i and j is given by the truncated linear norm, 
d{i,j) = min{M, \i — jj}, where M is the maximum value allowed. They present 
an algorithm that is a 4-approximation algorithm for the Truncated Labeling 
Problem and a 2-approximation for the ULP. The algorithm generates a network 
flow problem instance where the weights of edges come from the assignment and 
separation costs of the original problem. The resulting graph has 0{n^m) edges 
where the Min Cut algorithm is applied 0((m/M)(logQo + log£“^) times in or- 
der to obtain (4 -|- £)-approximation, where Qq is the cost of the initial solution. 

The LP-based and the Flow-based algorithms have a large time complexity, 
which turns in impractical algorithms for moderate and large sized instances, as 
in most applications cited above. 

In this paper, we present a fast approximation algorithm for the Uniform 
Metric Labeling Problem and prove that it is an 8 log n-approximation algorithm. 
We also compare the practical performance of this algorithm with the LP-based 
algorithms of Chekuri et al. and Kleinberg and Tardos. Although this algorithm 
has higher approximation factor, the solutions obtained have error ratio that 
was less than 2% from the optimum solution, when it was possible to compare, 
and the time improvement over the linear programming based algorithms is 
considerable. 
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In section 2, we present a proof for a greedy algorithm for the Set Cover 
Problem via a primal dual analysis. Then we analyze the case when the greedy 
choice is relaxed to an approximated one. In section 3, we present a general 
algorithm for the Uniform Labeling Problem using an algorithm for the Set 
Cover Problem. This last problem is solved approximately via an algorithm for 
the Quotient Cut Problem[13]. In section 4, we compare the presented algorithm 
with the LP-based algorithms. Finally, in section 5, we present the concluding 
remarks. 



2 A Primal-Dual Analysis for a Greedy Algorithm for 
the Set Cover Problem 



In this section, we present a primal-dual version for a greedy approximation 
algorithm for the Set Cover Problem, including a proof of its approximation 
factor. This proof is further generalized for the case when the greedy choice is 
relaxed to an approximated one. 

The Set Cover Problem is a well known optimization problem, that genera- 
lizes many others. This problem consists of: Given a set E = {ci, C 2 , . . . , 6^} of 
elements, a family of subsets S = {Si, S 2 , ■ ■ ■ , S'™}, where each Sj C E, has a cost 
Wj, Vj G {!,..., m}. The goal of the problem is to find a set Sol C {!,..., m} 
that minimizes '--IjeSoiSj = E. 

In [5], Chvatal present a greedy iL^-approximation algorithm for the 
Set Cover Problem, where g is the number of elements in the largest set in 
S and Hg is the value of the harmonic function of degree g. This algorithm it- 
eratively chooses the set with minimum amortized cost, that is the cost of the 
set divided by the number of non-covered elements. Once a set is chosen to be 
in the solution, all the elements in this set are considered as covered. In what 
follows we describe this algorithm more precisely. 

Greedy Algorithm for the Set Cover Problem (E,S,w) 

1. Sol ^0; U ^ E. 

2. While [/ yf 0 do 

3. f ^ arg mmj,s,nu^9 js^\ 

4 . Sol ^ Sol U {j'} 

5. U ^U\Sj, 

6. return Sol. 



To show the primal dual algorithm for the Set Cover Problem, we first present 
a formulation using binary variables Xj for each set Sj , where Xj = 1 if and only 
if Sj is chosen to enter in the solution. The formulation consists of finding x that 

minimize Xj '^j^j 

s-t Xj:ees, >1 Ve G A, 

a;jG{0, 1} Vj G {!,..., n}, 
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and the dual of the relaxed version consists of finding a that 
maximize 

S-t XeGS, ^ ^3 ^3 ^ { 1 , • ■ • , n}, 

Oe > 0 Ve G if. 

The greedy algorithm can be rewritten as a primal dual algorithm, with 
similar set of events. The algorithm uses a set U containing the elements not 
covered in each iteration, initially the set E, and a variable T with a notion of 
time associated with each event. The algorithm also uses (dual) variables Oe for 
each element e, starting at zero, and increasing in each iteration for the elements 
in U. 

Primal-Dual Algorithm for the Set Cover Problem (E,S,w) 

1. U ^ E; T^O; ae^0,VeGA; Sol ^ 9. 

2 . while [/ yf 0 do 

3. grow uniformly the time T and, at the same rate, grow Oe : e G C/ 
until there exists an index j such that XeGS nc/“e equal to wj. 

4. Sol ^ Sol U {j}. 

5 . U^U\Sj. 

6 . return Sol. 



Lemma 1. The sequence of events executed by the Greedy algorithm and by the 
Primal-Dual algorithm is the same. 

Proof. Note that the value of Og when XeGS nu ~ '^3 equal to the amortized 
cost. Since Og grows uniformly, it is clear that the algorithm, in each iteration, 
choose a set with minimum amortized cost. □ 

This lemma implies that all solutions obtained by the Greedy algorithm can 
be analyzed by the primal dual techniques. 

Lemma 2. Let Sj = {ei, C 2 , . . . , Ck} and ai the time variable associated to the 
item Ci, generated by the Primal-Dual algorithm. If ai < a 2 < ... < o-k, then 

< Wj. 

Proof. In the moment just before the time ai, all variables cXi, I < i < k have the 
same value, ai, and they are all associated with uncovered elements. Suppose the 
lemma is false. In this case, > wj and, in an instant strictly before ai, 

the set Sj would enter in the solution and all of its elements would have a < ai, 
that is a contradiction, since Sj has at least one element greater or equal to a;. 
Therefore, the lemma is valid. □ 

The Primal-Dual algorithm returns a primal solution Sol, with value 
val(Sol) := '^j^soi'Wj, such that val(Sol) = XgGs'^e’ Note that the variable 
a may be dual infeasible. If there exists a value 7 such that a /7 is dual feasible, 
i.e. XgGSj — Wj for each j G {1, . . . ,m}, then, by the weak duality theo- 
rem, val{Sol) < jOPT. The idea to prove the approximation factor is to find a 
value 7 for which a /7 is dual feasible. 
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Theorem 3. The Primal-Dual algorithm for the Set Cover Problem is an 
Hg -approximation algorithm, where g is the size of the largest set in S. 

Proof. Consider an arbitrary set S = {ei, . . . ,6k} with k elements and cost w 
and let Oj the time variable associated with element e^, i = 1, . . . ,k. Without 
loss of generality, assume that cti < . . . < 

If 7 is a value such that a/y is dual feasible then 

k 

^a*/7<w, (1) 

thus a necessary condition is 

7>^hi^. (2) 

w 

Applying Lemma 2 for each value of we have 



1 = 


1, 


kai < w, 




ai/w < 1/k, 


1 = 


2, 


{k — 1) 02 < w, 




a^fw < l/(fc — 1), 


1 = 


k, 


ak < w, 




akfw < 1. 



Adding the inequalities above we have 

E k 

< Hk. 
w 

Therefore, when 7 = Hg we obtain that a/7 is dual feasible. 



(3) 

□ 



Now, let us assume a small modification in the previous algorithm. Instead 
of choosing, in step 3 of the greedy algorithm, the set with minimum amortized 
cost, we choose a set Sj with amortized cost at most / times greater than the 
minimum. That is, if Aj* is a set with minimum amortized cost then the following 
inequality is valid 

Wj ^ Wjt 

|S'j n u\ - ^\s~JhT\' 

This modification can be understood, in the primal-dual version of the algorithm, 
as a permission that the sum nt/ value of wj by at most 

a factor of /. We denote by A/ the algorithms with this modification. 

Lemma 4. Let Sj = {ci, 62 , . . . , 6 ^} and ai the time variable associated with 
the item Cj generated by an A f algorithm. If ai < a 2 < . . . < ak then XiL/ ^ 

fWj. 

Proof. Suppose the lemma is false. In this case, there exists an execution where 
a/ > f Wj and, in an instant T < ai, the set Sj would enter in the solution, 
which is a contradiction. □ 

The following theorem can be proved analogously to Theorem 3. 

Theorem 5. If g is the size of the largest set in S then any Af algorithm is an 
f Hg-approximation algorithm. 
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3 A New Algorithm for the Uniform Labeling Problem 

The algorithm for the ULP uses similar ideas presented by Jain et al. [11] for 
a facility location problem. To present this idea, we use the notion of a star. 
A star is a connected graph where only one vertex, denoted as center of the 
star, can have degree greater than one. Jain et al. [11] present an algorithm that 
iteratively select a star with minimum amortized cost, where the center of each 
star is a facility. 

In the Uniform Labeling Problem, we can consider a labeling (/> : P — >■ L as 
a set of stars, each one with a label in the center. The algorithm for the ULP 
iteratively select stars of reduced amortized cost, until all objects have been 
covered. 

Given a star S = {I, Us} for the ULP, where I G L and Us U P, we denote 
by cs the cost of the star S, which is defined as 

CS = X! 2 

u^Us u^Us,v^{P\Us) 

that is, the cost to assign each element of Us to I plus the cost to separate each 
element of Us with each element of P\ C/s. We pay just the half of the separation 
cost, because the other half will appear when we label the elements in P \ Us- 

In the description of the main algorithm for the Uniform Labeling Problem, 
we denote by 5 ulp the set of all possible stars of an instance, U the set of 
unclassified objects, (PsCj '^ sc, "w') an instance for the Set Cover Problem, U a 
collection and (p ^ labeling. In the following, we describe the algorithm, called 
Gul, using an approximation algorithm for the Set Cover Problem as parameter. 




Algorithm Greedy Uniform Labeling (Gul) {L,P,c,w,Asc) 





L := set of labels; P := set of objects. 
:= assignment cost between a label i 


and an object u. 




Wuv '■= separation cost between an object u and an object v. 

Asc •= A /^-approximation algorithm for the Set Cover Problem. 


1 . 


Asc ^ P 




2 . 


Ssc ^ {Us ■ S = {1, Us] G 5ulp} 




3 . 


w's ^ Cs + \ J2ueUs,veP\Us ~ 


[1, Us} G 5ulp 


4 . 


^ -^sc {Esc,Ssc,w'), let {Psi, 


...,Ust}=U 


5 . 


U ^ P 




6 . 


for /c ^ 1 to t do 




7 . 


^ — l-jtfi G U sk fJ Uj 1 = Sk L 




8 . 


U^U\Usk 




9 . 


return (p. 
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3.1 Analysis of the Algorithm 

To analyze the algorithm we use the following notation: 

— waZuLP (</'): value, in the ULP, of the labeling (p. 

— valscPU)-. value, in the Set Cover Problem, of a collection U. 

— </>OPT^ an optimum labeling for the ULP. 

— Uqpt'. an optimum solution for the Set Cover Problem. 

— SC{(j))'. A collection {Usi,Us 2 , ■ ■ ■ ,Usk) related with a labeling (p = 
{S'!, S' 2 ) ■ • ■ , Sk} where Si is the star {I, Usi}- 



Lemma 6. If (p is a solution returned by the algorithm Gul and U the solution 
returned by algorithm Asc for the Set Cover Problem (at step 4) then 

uaZuLp(0) < valsc{h(). 



Proof 

vakc{U) = ^ w's 

S={l,Us}.Us(^U 

= ^ X! + X! 

S={l,Us}-Us(^U \ueUs u&Us,v&P\Us 

— ^ ^ T 2 ^ ^ ^uv 

uGP u,v^P:4>{u)^(I){v) 

= wAulP (</')■ 

The following inequalities are valid 





0u<p{u) < c„; 

ueP s={i,Us}.Us€U ueUs 


(4) 


and 




2 Wuv< 

u,v^P:(p(u)^(p{v) S:Us&^ u^U s ^v^P\U s 


(5) 



The inequality (4) is valid, since the algorithm assign (p{u) to u, if there exists 
a set Us in U such that u £ Us and (p{u) = S (1 L. Thus, the cost c„ 0 („) also 
appears in the right hand side of the inequality. 

The argument for the inequality (5) is similar. If (p{u) yf 4>{v) then, by the 
execution of the algorithm, there must exist two sets Usx G hi and Usy £ U 
such that (p{u) = SxC L and (p{v) = SyC\L. It is not allowed to occur both 
{m, w} C Usx and {u,v} C Usy Therefore, if the cost Wuv and Wm appears, 
once or twice, in the left hand side of the inequality, it must appear, at least 
once, in right hand side. □ 
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Lemma 7. waZsc(^OPT) < 2uaZuLp(</'OPT)- 
Proof. 

2uaZuLp(</'OPT) = 



> 



> 



where inequality (6) is valid since S'C(</)opt) is a solution for the Set Cover 
Problem, but not necessarily with optimum value. □ 

Theorem 8. If I is an instance for the ULP, (j) is the labeling generated by the 
algorithm Gul and 0opt is an optimum labeling for I then 

^'a^ULp(</>) < 2/3uaZuLp(</'OPT), 

where f3 is the approximation factor of the algorithm ^sc given as a parameter. 

Proof Let U be the solution returned by the algorithm ^sc (step 4 of the 
algorithm Gul) and ZYqpt an optimal solution for the corresponding Set Cover 
Instance. In this case, we have 

valui,p{(l>) < valsc{h() (7) 

< f3 valscii^OPT) (8) 

< 2/3uaZuLp(0OPT), (9) 

where the inequality (7) is valid by Lemma 6, the inequality (8) is valid since 

U is found by a /3-approximation algorithm, and the inequality (9) is valid by 
Lemma 7. □ 

To obtain an 8 log n-approximation algorithm for the ULP we need to present 
an algorithm „4sc that is a 4 log n-approximation for the Set Cover Problem. The 
algorithm is based on an approximation algorithm for the Quotient Cut Problem, 
which we describe in the following subsection. 

3.2 Showing .Asc 

In this section we show how to obtain a greedy algorithm for the Set Cover 
Problem without the explicit generation of all possible sets. The algorithm basi- 
cally generate a graph Gi for each possible label I G L and obtain a set Us with 
reduced amortized cost. 



2 XI ^‘5 

■SS0QPT 

E 

S={1,Us}&4>opt 

E 

tGSC(0Qp^) 

E 

wafec(^OPT), 



C5+2 



E 

u^Us,v^P\Us 



(6) 
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Consider a label I € L. We wish to find a set Us that minimize ws/\UsC\U\, 
where U is the set of unclassified objects in the iteration. Denote by G; the 
complete graph with vertex set V{Gi) = P U {/} and edge costs wqi defined as 
follows: 

WG,{u,v) := Wuv yu,vGP,Uy^v, 

WG, (u, 1) := Cui Vu G P. 

In other words, the cost of an edge between the label and an object is the 
assignment cost and the cost of an edge between two objects is the separation 
cost. 

Lemma 9. If C Q P is a cut of Gi, I ^ G, the cost of C, c(G) = 
'^u^C is equal to the cost of the set Us, S = {l,G}, to the Set 

Cover Problem. 

Proof. The lemma can be proved by counting. In the Set Cover Problem, ws is 
equal to 

^ ^ Wuv, where S = {l,Us}, 

u^Us u^Us,v^P\Us 

that is equal to the cost of the cut G. See Figure 1. □ 




Fig. 1. Example of a cut C that has the same cost wg for Us = C. 



The problem to find a set with smallest amortized cost c(G)/|G| in G; is a 
specification of the Quotient Cut Problem, which can be defined as follows. 

Quotient Cut Problem (QCP): Given a graph G, weights We G i?“*" for each 
edge e G E{G), and weights c„ G Z+ for each vertex v G V{G), find a cut 
G that minimizes ru(G)/ min{7r(G), 7t(G)}, where w{C) := 

Ti'(C') := 
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If we define a weight function Ci for each vertex of Gi as 

r 1 if r; G V{Gi) n P n C/, 
a{v) = I 0 ifuG I^(G,)n(P\C/), 

[ n if u G 14 [Gi) n P, 

then a cut G := mini^L{QC{Gi,Wi,Ci)}, returned by an /-approximation al- 
gorithm QC for the QCP, corresponds to a set Us, S = {l,G} that is at most 
/ times bigger than the set with minimum amortized cost. An Af algorithm 
can be implemented choosing this set. Thus, the following result follows from 
Theorem 5. 

Theorem 10. If there exists an f -approximation algorithm for the Quotient 
Cut Problem with time complexity T{n), then there exists a 2 f Hn- approximation 
algorithm for the ULP, with time complexity 0{nmT{n)). 

The Quotient Cut Problem is NP-Hard and the best approximation algo- 
rithm has an approximation factor of 4 due to Freivalds [9] . This algorithm has 
experimental time complexity 0(n^ ®) when the degree of each vertex is a small 
constant and 0(n^ ®) for dense graphs. Although this algorithm has polynomial 
time complexity estimated experimentally, it is not proved to be of polynomial 
time in the worst case. 

Theorem 11. There is an 8 log- approximation algorithm for the ULP, with 
time complexity estimated in 0{mn^'^). 

Note that the size of an instance I, size{I), is 0{n{n-\-m)), thus, the estimated 
complexity of our algorithm is 0{size{I)^'^). 

4 Computational Experiments 

We performed several tests with the algorithm Gul, the algorithm presented 
by Kleinberg and Tardos [12], denoted by A^, and the algorithm presented by 
Chekuri et al. [4], denoted by Ac- The tests were performed over instances gener- 
ated computationally and from image restoration problem. The implementation 
of the .4sc algorithm, called at step 4 of the algorithm Gul, is such that it gener- 
ates solutions for the Set Cover Problem without intersections between the sets. 
We observe that the computational resources needed to solve an instance with 
the presented algorithm are very small compared with the previous approxima- 
tion algorithms cited above. All the implemented algorithms and the instances 
are available under request. The tests were performed in an Athlon XP with 1.2 
Ghz, 700 MB of RAM, and the linear programs were solved by the Xpress-MP 
solver [14]. 

We started our tests setting (n, m) ^ (L^J> [§1) ^ series of values of k 

and creating random instances as follows: Cui <— random{1000) ,\/u € P,\/i € L e 
Wuv = random{10),\/u,v G P where randomff) returns a random value between 
0 and t. It must be clear that these are not necessary hard instances for the 
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problem or for the algorithms. Unfortunately, we could not find hard instances 
for the uniform labeling problem publically available. 

The largest instance executed by the algorithm Ac has 46 objects and 24 
labels and the time required was about 16722 seconds. When applying the algo- 
rithm Gul in the same instance, we obtained a result that is 0.25% worst in 7 
seconds. 

The largest instance solved by the algorithm Ak has 80 objects and 40 labels 
and the time required to solve it was about 15560 seconds. When applying the 
Gul in the same instance, we obtained a result that is 1.01% worst in 73 seconds. 

The major limitation to the use of the algorithm Ak was the time complexity, 
while the major limitation to the use of the algorithm Ac was the memory and 
time complexity. The times spent by the algorithms Ak and Ac are basically 
the times to solve the corresponding linear programs. See Figure 2 to compare 
the time of each algorithm. 




(n+m) 



Fig. 2. Comparing the experimental time of the implemented algorithms. 



One can observe in Figure 2 that there is a strong reduction in the function 
time associated with the Gul algorithm when (n + m) achieves 290. 

This behavior results from the following observation: when the separation 
costs are relatively bigger than the assignment cost, there are better possibilities 
that the size of the star become bigger, resulting in a less convergence time. 
The expectation of the separation costs in our instances, for stars with a small 
number of objects, is proportional to n, while the expectation of the connection 
costs is kept constant. Thus, growing n + m also grows the separation cost. The 
average number of objects assigned by iteration just before the time reduction 
is 1.2, that is near the worst case, while for the instant just after the decreasing, 
the average number of objects assigned by interaction is 32.3. 
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For all these instances, for which we could solve the linear program relaxation, 
the maximum error ratio of the solutions obtained by the algorithm Gul and 
the solution of the linear relaxation is 0.0127 (or 1.27% worse). 

We also perform some tests with the algorithms Gul and Ak for instances 
generated by fixing n or m and varying the other. The results are presented 
in Table 1. The maximum error ratio obtained is 1.3% and the time gained is 
considerable. As expected, when the number of labels is fixed, the time gained 
is more significant. 



Table 1. Classification times and costs using Kleinberg and Tardos formulation and 
the primal dual algorithm. 



1 Instance 


Times 


Costs 


■yaZ(GuL) 

VQ>l(A K ) 


Objects 


Labels 


Gul 


Ak 


t;aZ(GuL) 


val{AK) 


40 


5 


0 


1 


9388 


9371 


1.001 


40 


15 


3 


6 


5962 


5923 


1.007 


40 


25 


5 


12 


5085 


5061 


1.005 


40 


35 


6 


30 


4763 


4732 


1.007 


40 


45 


8 


49 


4533 


4477 


1.013 


40 


55 


10 


75 


4587 


4552 


1.008 


40 


65 


11 


68 


4364 


4348 


1.004 


5 


40 


0 


1 


137 


137 


1.000 


15 


40 


0 


1 


812 


812 


1.000 


25 


40 


2 


3 


1817 


1802 


1.008 


35 


40 


4 


17 


3736 


3710 


1.007 


45 


40 


11 


77 


5986 


5986 


1.000 


55 


40 


22 


400 


8768 


8735 


1.004 


65 


40 


33 


2198 


12047 


11975 


1.006 



To illustrate the applicability of the primal dual algorithm in practical in- 
stances, we have applied the algorithm for the image restoration problem with 
an image degenerated by noise. The image has pixels in gray scale and dimension 
60x60 with a total of 3600 pixels (objects) to be classified in black and white 
colors. To define the assignment and separation cost, we consider that each color 
is an integer between 0 and 255. The assignment cost of an object u to a label i 
is given by c„j = | clr{u) — clr{i) \ , where clr{u) is the actual color of u and clr{i) 
is the color assigned to the label i. The separation cost of an object u and an 
object V is given by Wuv = 255 — \clr{u) — clr{v)\ if u is one of the nine direct 
neighbors of v, Wuv = 0 in the other case. 

The following images, presented in figures 3-6, present the results obtained 
applying the primal dual algorithm. In each figure, the image in the left is ob- 
tained inserting some noise and the image in the right is the image obtained 
after the classification of the objects. 

The time difference between images classification is because the separation 
cost in images with less noise is bigger than in images with more noise. Thus 
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Fig. 3. Image with 25% of noise. Time 
needed is 288 seconds 




Fig. 5. Image with 75% of noise. Time 
needed is 511 seconds 




Fig. 4. Image with 50% of noise. Time 
needed is 319 seconds 




Fig. 6. Image with 100% of noise. Time 
need is 973 seconds 



the average size of the stars chosen by the algorithm in images with less noise is 
bigger, resulting in a minor convergence time. 

Although these times are large for image processing applications, it illustrates 
its performance and solution quality. In addiction, in real instances, it is possible 
to define a small windows over greater images, and the processing time will 
decrease. Clearly, the classification problem is more general and the primal dual 
algorithm is appropriate for moderate and large size instances. 

5 Concluding Remarks 

We have presented a primal dual algorithm with approximation factor 8 log n for 
the Uniform Labeling Problem. We compared the primal dual algorithm with LP- 
based approximation algorithms. The previous approximation algorithms for this 
problem have high time complexity and are adequate only for small and moderate 
size instances. The presented algorithm could obtain high quality solutions. The 
average error ratio of the presented algorithm was less than 2% of the optimum 
solution, when it was possible to compare, and it could obtain solutions for 
moderate and large size instances. 

We would like to thank K. Freivalds to made available his code for the Quo- 
tient Cut Problem. 
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Abstract. This paper deals with systems of multiple mobile robots each 
of which observes the positions of the other robots and moves to a new 
position so that eventually the robots form a circle. In the model we 
study, the robots are anonymous and oblivious, in the sense that they 
cannot be distinguished by their appearance and do not have a common 
x-y coordinate system, while they are unable to remember past actions. 
We propose a new distributed algorithm for circle formation on the plane. 
We prove that our algorithm is correct and provide an upper bound for 
its performance. In addition, we conduct an extensive and detailed com- 
parative simulation experimental study with the DK algorithm described 
in [7] . The results show that our algorithm is very simple and takes con- 
siderably less time to execute than algorithm DK. 



1 Introduction, Our Results, and Related Work 

Lately, the field of cooperative mobile robotics has received a lot of attention 
from various research institutes and industries. A focus of these research and 
development activities is that of distributed motion coordination, since it allows 
the robots to form certain patterns and move in formation towards cooperating 
for the achievement of certain tasks. Motion planning algorithms for robotic 
systems made up from robots that change their position in order to form a given 
pattern is very important and may become challenging in the case of severe 
limitations, such as in communication between the robots, hardware constraints, 
obstacles etc. 

The significance of positioning the robots based on some given patterns may 
be useful for various tasks, such as in bridge building, in forming adjustable but- 
tresses to support collapsing buildings, satellite recovery, or tumor excision [12]. 

* This work has been partially supported by the 1ST Programme of the European 
Union under contract numbers IST-2001-33116 (FLAGS) and IST-2001-33135 (CRE- 
SCCO). 
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Also, distributed motion planning algorithms for robotic systems are potentially 
useful in environments that are inhospitable to humans or are hard to directly 
control and observe (e.g. space, undersea). 

In this paper, we consider a system of multiple robots that move on the plane. 
The robots are anonymous and oblivious, i.e. they cannot be distinguished by a 
unique id or by their appearance, do not have a common x-y coordinate system 
and are unable to remember past actions. Furthermore, the robots are unable 
to communicate directly (i.e. via a wireless transmission interface) and can only 
interact by observing each other’s position. Based on this model we study the 
problem of the robots positioning themselves to form a circle. 

Remark that the formation of a circle provides a way for robots to agree on a 
common origin point and a common unit distance [15]. Such an agreement allows 
a group of robots to move in formation [9] . In addition, formation of patterns and 
flocking of a group of mobile robots is also useful for providing communication 
in ad-hoc mobile networks [5,4,3]. 

Related Work. The problem of forming a circle having a given diameter by 
identical mobile robots was first discussed by Sugihara and Suzuki [14]; they 
proposed a simple heuristic distributed algorithm, which however forms an ap- 
proximation of a circle (that reminds a Reuleaux triangle). In an attempt to 
overcome this problem, Suzuki and Yamasihita [16] propose an algorithm under 
which the robots eventually reach a configuration where they are arranged at 
regular intervals on the boundary of a circle. However, to succeed in forming the 
pattern, the robots must be able to remember all past actions. Lately, Defago 
and Konogaya [7] designed an algorithm that manages to form a proper circle. 

Under a similar model, in which robots have a limited vision, Ando et al. [1] 
propose an algorithm under which the robots converge to a single point. Flochini 
et al. [8] study the same problem by assuming that the robots have a common 
sense of direction (i.e. through a compass) and without considering instantaneous 
computation and movement. 

We here note that there exist other models and problems for the development 
of motion planning algorithms for robotic systems, e.g. [6,17]. In that model most 
of the existing motion planning strategies rely on centralized algorithms to plan 
and supervise the motion of the system components [6], while recently efficient 
distributed algorithms have been proposed [17]. Our work is also inspired by 
problems of coordinating pebble motion in a graph, introduced in [10]. 

Our Contribution. In this work we use and extend the system model stated 
by Defago and Konogaya [7]. Under this model we present a new distributed 
algorithm that moves a team of anonymous mobile robots in such a way that 
a (non degenerate) circle is formed. The new algorithm presented here is based 
on the observation that the Defago - Konogaya algorithm (DK algorithm) [7] 
is using some very complex computational procedures. In particular, the use of 
(computationally intensive) Voronoi diagrams in the DK algorithm is necessary 
to avoid the very specific possibility in which at least two robots share at some 
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time the same position and also have a common coordinate system. We remark 
that in many cases (e.g. when the system is comprised of not too many robots 
and/or it covers a large area) this possibility is not very probable. Based on 
this remark, we provide a new algorithm, which avoids the expensive Voronoi 
diagram calculations. Instead our algorithm just moves the robots towards the 
closest point on the circumference of the smallest enclosing circle. We prove that 
our algorithm is correct and that the robots travel shorter distance than when 
executing the DK algorithm, i.e. the performance of the DK algorithm is an 
upper bound for the performance our algorithm. 

Furthermore we conduct a very detailed comparative simulation experimental 
study of our algorithm and the DK algorithm, in order to validate the theoretical 
results and further investigate the performance of the two algorithms. We remark 
that this is the first work on implementing and simulating the DK algorithm. 
The experiments show that the execution of our algorithm is very simple and 
takes considerably less time to complete than the DK algorithm. Furthermore, 
our algorithm seems to be more efficient (both with respect to number of moves 
and distance travelled) in systems that are made up from a large number of 
mobile robots. 

We now provide some definitions that we will use in the following sections. 

Smallest Enclosing Circle. The smallest enclosing circle of a set of points P 
is denoted by SEC(P). It can be defined by either two opposite points, or by at 
least three points. The smallest enclosing circle is unique and can be computed 
in O(nlogn) time [13]. 

Voronoi Diagram. The Voronoi diagram of a set of points P = {pi,p 2 , . . . ,p„}, 
denoted by Voronoi(P), is a subdivision of the plane into n cells, one for each 
point in P. The cells have the property that a point q in the plane belongs to 
the Voronoi cell of point pi, denoted Vcellp^{P), if and only if, for any other 
point Pj € P, dist(q,Pi) < dist(pj,q), where dist(p, q) is the Euclidean distance 
between two points p and q. In particular, the strict inequality means that points 
located on the boundary of the Voronoi diagram do not belong to any Voronoi 
cell. More details on Voronoi diagrams can be found in [2]. 



2 The Model 

Let ri,r 2 , . . . ,r„ be a set of extremely small robots, modelled as mobile pro- 
cessors with infinite memory and a sensor to detect the instantaneous position 
of all robots (i.e. a radar). Movement is accomplished with very high precision 
in an unbounded two dimensional space devoid of any landmark. Each robot 
uses its own local x-y coordinate system (origin, orientation, distance) and has 
no particular knowledge of the local coordinate system of other robots, nor of a 
global coordinate system. It is assumed that initially all robots occupy different 
positions, although, due to their small size, two or more robots may occupy the 
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same position at some time. We furthermore assume that no two robots are lo- 
cated on the same radius of the Smallest Enclosing Circle, an assumption needed 
to guarantee the correctness of our protocol. Note that due to the small size of 
the robots, the probability of failure of our algorithm is very small (as can be 
shown by a simple balls-and-bins argument). 

Robots are anonymous in the sense that they are unable to uniquely identify 
themselves. All robots execute the same deterministic algorithm, and thus have 
no way to generate a unique identity for themselves; more generally, no random- 
ization can be used and any two independent executions of the algorithm with 
identical input values always yield the same output. 

Time is represented as an infinite sequence of time instants ■ ■ ■ and 

at each time instant t, every robot is either active or inactive. Without loss of 
generality we assume that at least one robot is active at every time instance. Each 
time instant during which a robot becomes active, it computes a new position 
using a given algorithm and moves towards that position. Conversely, when a 
robot is inactive, it stays still and does not perform any local computation. We 
use At to denote the set of active robots at t and call the sequence A = Aq, Ai, . . . 
an activation schedule. We assume that every robot becomes active at infinite 
many time instants, but no additional assumptions are made on the timing with 
which the robots become active. Thus A needs satisfy only the condition that 
every robot appears in infinitely many A' s. 

Given a robot r,, Pi{t) denotes its position at time t, according to some global 
x-y coordinate system (which is not known to the robots) and Pi(0) is its initial 
position. P{t) = {pi{t) II 1 < i < n} denotes the multiset of the position of all 
robots at time t. 

The algorithm that each robot Vi uses is a function (j) that is executed each 
time Vi becomes active and determines the new position of rt, which must be 
within one distance unit of the previous position, as measured by r^’s own co- 
ordinate system. The arguments to (j) consists of the current position of rt and 
the multiset of points containing the observed positions of all robots at the cor- 
responding time instant, expressed in terms of the local coordinate system of 
Vi. It is assumed that obtaining the information about the system, computing 
the new position and moving towards it is instantaneous. Remark that in this 
paper we consider oblivious algorithms and thus (f> is not capable of storing any 
information on past actions or previous observations of the system. The model 
we use is similar to that of by Defago and Konogaya [7], which in turn is based 
on the model of Suzuki and Yamashita [16]. 



3 The Problem 

In this paper we consider the problem of positioning a set of mobile robots in 
such a way so that a circle is formed, with finite radius greater than zero. We 
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call such a circle non degenerate. We also consider the more difficult problem in 
which the robots are arranged at regular intervals on the boundary of the circle. 

Problem 1 (Circle Formation). Given a group of n robots with 

distinct positions, located arbitrarily on the plane, arrange them to eventually 
form a non degenerate circle. 



Problem 2 (Uniform Circle Formation). Given a group of n robots ri, C 2 , . . . , r„ 
with distinct positions, located arbitrarily on the plane, eventually arrange them 
to eventually at regular intervals on the boundary of a non degenerate circle. 

A possible way to solve the problem of forming a uniform circle is to form a 
“simple” circle and then transform the robot configuration as follows. 

Problem 3 (Uniform Transformation). Given a group of n robots ri,r 2 , . ■ . ,r„ 
with distinct positions, located on the boundaries of a non degenerate circle, 
eventually arrange them at regular intervals on the boundary of the circle. 



4 A New Circle Formation Algorithm 

4.1 The Defago-Konogaya (DK) Algorithm 

We first briefly describe the Defago-Konogaya circle formation algorithm (the 
DK algorithm) that is described in [7]. The algorithm relies on two facts: (i) 
the environment observed by all robots is the same, in spite of the difference 
in local coordinate system and (ii) the smallest enclosing circle is unique and 
depends only on the relative positions of the robots. Based on these two facts, 
the algorithm makes sure that the smallest enclosing circle remains invariant 
and uses it as a common reference. 

Initially, given an arbitrary configuration in which all robots have distinct 
positions, a sub-algorithm {((circle) brings the system towards a configuration in 
which all robots are located on the boundary of the circle (i.e. solves prob. 1). 
Then, a second sub-algorithm {((uniform) converges towards a homogeneous dis- 
tribution of the robots along that circumference, but it does not terminate 
(i.e. solves prob. 3). Glearly, the combination of the above two sub-algorithms 
solves the problem of Uniform Gircle Formation (prob. 2). 

Circle Formation Algorithm ((circle- The main idea of the algorithm is very 
simple: robots that are already on the boundary of the circle do not move and 
robots that are in the interior of the circle are made to move towards the bound- 
ary of the circle. When a robot that is located in the interior of the circle is 
activated, it observes the positions of the other robots and computes the Voronoi 
diagram. Given the boundaries of SEC(P) and the Voronoi cell where the robot 
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Fig. 1. Illustration of algorithm (jjdrcie, executed by robot Vi (in each case, r; moves 
towards r() 



is located, it can find itself in either one of three types of situations: 

Case 1: When the circle intersects the Voronoi cell of the robot (see Fig. la), 
the robot moves towards the intersection of the circle and the Voronoi cell. 
Case 2: When the Voronoi cell of the robot does not intersect with the circle 
(see Fig. lb), the robot selects the point in its Voronoi cell which is nearest to 
the boundary of the circle (or farthest from its center). 

Case 3: Due to symmetry, there exist several points (see Fig. Ic). In this case, 
all solutions being the same, one is selected arbitrarily. This is for instance done 
by keeping the solution with the highest x-coordinate (and then y-coordinate) 
according to the local coordinate system of the robot. 

Uniform Transformation Algorithm ipuniform- Given that all robots are lo- 
cated on the circumference of a circle, the algorithm ipuniform converges toward 
a homogeneous distribution of robots, but does not terminate deterministically. 
The algorithm (j)uniform works as follows: when a robot becomes active, it con- 
siders its two direct neighbors prev(ri) and next(ri) and computes the midpoint 
between them and moves halfway towards it. 

The reason for moving halfway toward the midpoint rather than toward 
the midpoint itself is to prevent situations where the system oscillates endlessly 
between two different configurations when robots are perfectly synchronized. The 
system would get stuck into an infinite cycle and hence be unable to progress 
toward an acceptable solution. 

The authors prove that algorithm 4’drcie is correct ( Theorem 2, [7]) and solves 
the problem of Circle Formation (prob. 1) and that algorithm (j) uni form converges 
toward a configuration wherein all robots are arranged at regular intervals on 
the boundary of a circle {Theorem 3, [7]) and thus solves the problem of Uniform 
Transformation (prob. 3). 
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4.2 Our Algorithm (“direct”) 

We now present a new distributed algorithm (which we call direct) that moves 
a set of oblivious mobile robots in such a way that a (non degenerate) circle is 
formed . The new algorithm is motivated by the observation that the DK algo- 
rithm is using some very complex (i.e. computationally intensive) procedures. 
Indeed, computing the Voronoi diagram based on the locations of the robots 
and then moving towards the smallest enclosing circle, involves computationally 
complex procedures. Instead, our algorithm is much simple since it just moves 
the robots towards the closest point on the circumference of the smallest en- 
closing circle. In Fig. 2, we provide a pseudo-code description. In more details, 
under algorithm direct a robot can find itself in either one of the following two 
situations: 

Case 1: Robot r is located on the boundary of SEC{P)] r stays still. 

Case 2: Robot r is located inside SEC{P); r selects the closest point of the 
boundary of SEC{P) to its current position and moves towards this point. 



function (j}'^ircu{P^Pi) 

1: begin 

2: if {pi e SEC{P) == true) 

3: begin 

4: stay still 

5: end 

6: else 

7: begin 

8: target := point on boundary of SEC(P) closest to pi 

9: move toward target 

10: end 

11: end 

Fig. 2. Formation of an (arbitrary) circle (code executed by robot n) 



In order to prove the correctness of algorithm direct we work in a similar 
way as in [7]. We first prove that when all robots are positioned on the bound- 
ary of the smallest enclosing circle, the algorithm terminates (Lemma 1). We 
then show that the robots that are not positioned on the boundary of SEC{P) 
will always move towards the boundary (and not further away from SEC{P), 
Lemma 2). Finally we show that a finite period of time is needed for those robots 
that are positioned inside SEC{P) to move towards the boundary of SEC{P) 
(Lemma 4). 



Lemma 1. Under Algorithm direct, all configurations in which the smallest en- 
closing circle passes through all robots are stable. 
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Proof. In such configurations, it is pi G SEC{P),'ii G {1,2,..., n}. Thus condi- 
tion 2 in the pseudo-code description is satisfied for all robots, thus every robot 
stays still (line 4) and the total configuration is stable. □ 



Lemma 2. No robot moves to a position further away from SEC{P) (than its 
current position). 

Proof. Under algorithm direct, robots move only at lines 4 and 9. Let us consider 
each case for some arbitrary robot r. 

1. At line 4, r stays still, so it obviously does not move towards the center. 

2. At line 9, r moves from the interior of the circle toward a point located on 
the boundary of the circle. So, r actually progresses away from the center. 

Thus, the robot r is unable to move further away (i.e. towards the circle center) 
from the boundary of the smallest enclosing circle in any of the two cases. □ 



Lemma 3. There exists at least one robot in the interior of SEC{P) that can 
progress a non null distance towards SEC{P). 

Proof. Under algorithm direct the motion of the robots that are located in the 
interior of the smallest enclosing circle, is not blocked by other robots, since 
each robot’s movement in our algorithm is independent of the positions of other 
robots. Therefore, at any given time instance t, robot ri that is active and is 
located in the interior of the smallest enclosing circle, will move towards the 
boundary of the circle. 

Thus, if a set of robots is located in the interior of SEC{P) and at least 
one robot is active, there exists at least one robot (an active one) in the interior 
of SEC{P) that can progress a non null distance towards SEC{P). This hence 
proves the lemma. □ 



Lemma 4. All robots located in the interior of SEC{P) reach its boundary 
after a finite number of activation steps. 

Proof. By Lemma 2 no robot ever moves backwards. Let us denote by Pin the 
set of all robots located in the interior of SEC{P). By Lemma 3, there is at least 
one robot r G Pm that can reach SEC{P) in a finite number of steps. Once r 
has reached SEC{P), it does not belong to Pm anymore. So by Lemma 3, there 
must be another robot in Pm that can reach SEC{P) in a finite number of steps. 
Since there is a finite number of robots in P, there exists some finite time after 
which all robots are located on the boundary of the smallest enclosing circle 
SEC{P). □ 
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Theorem 1. Algorithm direct solves the problem of Circle Formation (prob. 1). 

Proof. By Lemma 4, there is a finite time after which all robots are located 
on the smallest enclosing circle, and this configuration is stable (by Lemma 1). 
Consequently, algorithm direct solves the Circle Formation problem. □ 



Corollary 1. The combination of functions and (puniform solves the 

problem of Uniform Circle Formation (prob. 2). 



Remark 1. Corollary 1 assumes that no two robots end-up at the same circle 
position after executing (p'drcie- This is in fact guaranteed in our model, since no 
two robots are initially located exactly on the same radius. Note however that 
this modelling assumption is not very restrictive. Indeed the probability of such 
an event is upper bounded by 




where m is the number of “different” radiuses and n is the number of robots. 
Clearly, when m tends to infinity (due to the small size and high precision of 
robots) then this upper bound tends to 0. 

We now provide a comparison of the performance of our algorithm to that 
of the DK algorithm. More specifically, we show that the distance covered by 
the robots when executing algorithm direct is at most the distance covered when 
executing algorithm DK. This is the initial intuition that led us to the design of 
the new algorithm. 





Fig. 3. Distance comparison of next position under the two algorithms 
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Theorem 2. A robot r that executes algorithm direct covers distance less or 
equal to the distance covered by r when executing function 4>circie of algorithm 
DK, given that the initial conditions of the system are identical. In other words, 
the distance covered by r when executing 4>arcie is an upper bound for the dis- 
tance covered by r when executing algorithm direct. 

Proof. Let r be a robot positioned at point A in the plane which is inside the 
boundaries of SEC{P) (see Fig. 3a) and r is executing algorithm direct. When r 
is activated, it will move on the line that connects the center of SEC{P) (point 
c) and A. Regardless of the number of steps that will be required in order to 
reach the boundary of the circle, r will keep moving along the line cA. Let B 
the point where cA intersects SEC{P). Therefore the distance covered by r is 
equal to \AB\. 

Now let assume that r is executing function (fcircie of algorithm DK. When 
r is activated, it will start moving away from A towards a point B' located on 
the boundary of SEC{P), i.e. not necessarily on the line AB. 

Let C be a circle centered on A having radius equal to \AB\. We observe that 
any point B' (other than B) that lies on the boundary of SEC{P) is positioned 
outside the boundary of C and thus, based on the definition of the circle, further 
from the center of C (i.e. A). Therefore \AB\ < |AR'|. 

In the case when r is moving towards B' after first passing through a point 
D outside the C circle, its route will clearly greater length than AB' (by the 
triangle inequality) and thus greater than AB. For instance, considering the 
example of Fig. 3b, it holds that: \AD\ + \DB'\ > \AB'\ > \AB\. 

Thus, if r follows any other path than AB in order to reach the boundary of 
SEC{P), it will cover grater distance, which proves the theorem. □ 



5 Experimental Evaluation 

In order to evaluate the performance of the two protocols and further investigate 
their behavior we carried a comparative experimental study. All of our imple- 
mentations follow closely the protocols described above and the simulation en- 
vironment that we developed is based on the model presented in section 2. They 
have been implemented as C++ classes using several advanced two-dimensional 
geometry data types and algorithms of LEDA [11]. The experiments were con- 
ducted on a Linux box (Mandrake 8.2, Pentium III at 933M/iz, with 512MB 
memory at 133Mhz) using g++ compiler ver.2.95.3 and LEDA library uer.4.1. 

In order to evaluate the performance of the above protocols we define a set 
of efficiency measures which we use to measure the efficiency of each protocol 
considered, for each problem separately. We start by defining the average number 
of steps performed by each robot, i.e. the total number of steps performed by 
robot Ti when executing function (j) and then calculate the average number of 
steps dividing by the number of robots. More formally this is defined as follows: 
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Definition 1. Let be the total number of steps performed by robot when 

executing function (f>. Let avgM^ = — - be the average number of steps 

required for function (f to form a given pattern, where n is the number of robots 
in the system. 

Although aygM^ provides a way to compare the performance of the two pro- 
tocols, we cannot use the total number of steps for measuring the time complexity 
of a formation algorithm since a robot may remain inactive for an unpredictable 
period of time. An alternative measure, also proposed in [16], is the total distance 
that a robot must move to form a given pattern. 

Definition 2. Let be the total distance covered by robot when executing 

function (f. Let avgll^ = — - be the average distance required for function 

4> to form a given pattern, where n is the number of robots in the system. 

Finally, another important efficiency measure is the total time of execution 
required by function (f to form a given pattern. The computational complexity is 
a very important efficiency measure since formation algorithms are executed in 
real time and the computational power of the mobile modules can be a limiting 
factor. Although the function (j) is executed by each robot in a distributed fashion, 
we consider the overall time required by the robots until the given pattern is 
formed. Remark that when calculating the execution of function (f we assume 
that the motion of the robots takes negligible time (i.e. the motion of the robots 
is instantaneous). 

Definition 3. Let be the execution time required by robot to execute 
function (f. Let totT = be the total execution time required for function 

4> to form a given pattern, where n is the number of robots in the system. 

Based on the above, we start our experimentation by considering the effect 
of the dimensions of the Euclidean space where the mobile robots move on the 
performance of the two algorithms. More specifically we use a fixed number 
of mobile robots (100) on an Euclidian plane of dimensions X G [10,50] and 
X = Y. The robots are positioned using a random uniform distribution. Each 
experiment is repeated for at least 100 times in order to get good average results. 

Figure 4 depicts the average number of steps and average distance travelled 
under the two algorithms for the circle formation problem (i.e. the first phase). 
We observe that the performance of the two algorithms is more or less linear in 
the dimensions of the plane. Actually, it seems that the gradient of the lines is 
similar for both algorithms. It is evident that algorithm direct manages to form 
a circle requiring slightly fewer number of steps than the DK algorithm and the 
robots move in a way such that less distance is covered. This result verifies the 
theoretical upper bound of Theorem 2. 
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(a) Average Number of Steps (avgM) 



(b) Average Distance Covered (avgD) 



Fig. 4. Circle Formation: Evaluating the performance for various Plane Dimensions 



The results of the second phase of the algorithm, i.e. for the uniform trans- 
formation problem, are shown in Fig. 5 for the same efficiency measures. In this 
phase, the DK algorithm is performing substantially better regarding the average 
number of steps while the robots cover slightly less distance than in algorithm 
direct. Again we observe that the performance of the two algorithms is more or 
less linear in the dimensions of the plane. 

In Fig. 6 we get the combination of the results for the first and second phase, 
i.e. when considering the uniform circle formation problem. First, we observe that 
the average number of steps performed by the robots (see Fig. 6a) in the second 
phase {(^uniform) IS dominating the overall performance of the two algorithms. 
Thus, the DK algorithm manages to form a uniform circle with a much smaller 
average number of steps per robot. However, regardless of the number of steps, 
under both algorithms the robots seem to cover similar distances (see Fig. 6b). 

Figure 10 depicts the total time required to execute each algorithm for both 
phases. Clearly, algorithm direct executes in significantly less time and actually 
the graph suggests that it takes about 60 times less than the DK algorithm. It is 
evident that algorithm direct is very simple to execute making the total execution 
time almost independent from the plane dimensions while the execution time of 
the DK algorithm seems to be linear to the plane dimensions. 

In the second set of experiments we investigate the performance of the two 
algorithms as the number of mobile robots increases (m G [50, 1000]) while 
keeping the dimensions of the Euclidean plane fixed {X = Y = 20); that is, 
we investigate the effect of the density of robots on the performance of the two 
algorithms. 

Figure 7a depicts the average number of steps as the density of robots in- 
creases for the first phase of the two algorithms (i.e. the circle formation prob- 
lem). The graph shows that the performance of the DK algorithm, in terms of 
average number of steps, is linear in the total number of mobile robots. On the 
other hand, algorithm direct seems to be unaffected by this system parameter, 
especially when m > 200. This threshold behavior of the two algorithms is also 
observed in Fig. 7b. For low densities of robots, the average distance increases 
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Fig. 5. Uniform Transformation: Evaluating the performance for various Plane Di- 
mensions 
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Fig. 6. Uniform Circle Formation: Evaluating the performance for various Plane Di- 
mensions 



slowly with m until a certain number of robots is reached (m = 200) when no 
further increase is observed. Again, we observe that algorithm direct manages 
to form a circle requiring the robots to move in a way such that less distance is 
covered which verifies the theoretical upper bound of Theorem 2. 

Regarding the second phase, in figure 8a we observe that both algorithms 
require a high number of steps, which, interestingly, decreases as the density 
of robots increases. However, for the DK algorithm, when the number of robots 
crosses the value m = 200, the average number of steps starts to increase linearly 
to the number of robots that make up the system. On the other hand, when 
m > 200, the performance of algorithm direct seems to remain unaffected by the 
density of the robots. As a result, when m = 300 the two algorithms achieve 
the same performance, while for m > 300 algorithm direct outperforms the DK 
algorithm. 

The average distance covered by the robots under the two different algo- 
rithms, as the density of the system increases is shown in figure 8b. In this 
graph we observe that the two algorithms have a different behavior than that 
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Fig. 7. Circle Formation: Evaluating the performance as the number of Mobile Robots 
increases 



in the first phase. More specifically, initially the performance of the algorithms 
drops very fast as the density of the system increases until a certain value is 
reached (m = 200) when the performance starts to increase linearly with m. 
Furthermore, taking into account the statistical error of our experiments. Fig. 8b 
suggests that when the robots execute algorithm direct they travel about 5% less 
distance than when executing algorithm DK. 




I direct DK | 

(a) Average Number of Steps (avgM) 



I direct DK ] 

(b) Average Distance Covered (avgD) 



Fig. 8. Uniform Transformation: Evaluating the performance as the number of Mobile 
Robots increases 



When considering the uniform circle formation problem (i.e. for both phases 
of the algorithms) it is clear that the overall behavior of the two algorithm is 
dominated by the second phase (see Fig. 9a). It seems that algorithm direct 
requires a constant number of steps when m > 300 while the DK algorithm’s 
performance is linear to the number of mobile robots that make up the system. 
As for the average distance covered, again, the behavior of the algorithms is 
dominated by the second phase. Fig. 9b follows the same pattern as in Fig. 8b. 

Finally, Fig. 11 depicts the overall time required by the robots to execute 
each algorithm. In this figure we clearly see that the DK algorithm has a very 
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poor performance as the density of the robots in the system increases. On the 
other hand, the performance of algorithm direct is independent from the number 
of robots and is clearly more suitable for systems comprised of very large number 
of robots. 
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Fig. 10. Uniform Circle Formation: To- 
tal Execution Time (avgT) vs. Plane Di- 
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Fig. 11. Uniform Circle Formation: To- 
tal Execution Time (avgT) as the number 
of Mobile Robots increases 



6 Closing Remarks 

We presented a new algorithm for the problem of circle formation for systems 
made up from anonymous mobile robots that cannot remember past actions. We 
provided a proof of correctness and provided an upper bound on its performance. 
The experiments show that the execution our algorithm is very simple and takes 
considerably less time to complete than the DK algorithm and in systems that 
are made up from a large number of mobile robots, our algorithm is more efficient 
than DK (with respect to number of moves and distance travelled by the robots). 
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Abstract. We investigate two cutting problems and their variants in 
which orthogonal rotations are allowed. We present a dynamic program- 
ming based algorithm for the Two-dimensional Guillotine Cutting Prob- 
lem with Value (GCV) that uses the recurrence formula proposed by 
Beasley and the discretization points defined by Herz. We show that if 
the items are not so small compared to the dimension of the bin, this 
algorithm requires polynomial time. Using this algorithm we solved all 
instances of GCV found at the OR-LIBRARY, including one for which 
no optimal solution was known. We also investigate the Two-dimensional 
Guillotine Cutting Problem with Demands (GCD). We present a column 
generation based algorithm for GCD that uses the algorithm above men- 
tioned to generate the columns. We propose two strategies to tackle the 
residual instances. We report on some computational experiments with 
the various algorithms we propose in this paper. The results indicate that 
these algorithms seem to be suitable for solving real-world instances. 



1 Introduction 

Many industries face the challenge of finding solutions that are the most econom- 
ical for the problem of cutting large objects to produce specified smaller objects. 
Very often, the large objects (bins) and the small objects (items) have only two 
relevant dimensions and have rectangular shape. Besides that, a usual restriction 
for cutting problems is that in each object we may use only guillotine cuts, that 
is, cuts that are parallel to one of the sides of the object and go from one side 
to the opposite one; problems of this type are called two-dimensional guillotine 
cutting problems. This paper focuses on algorithms for such problems. They are 
classical AfP-hard optimization problems and are of great interest, both from 
theoretical as well as practical point-of-view. 

This paper is organized as follows. In Section 2, we present some definitions 
and establish the notation. In Section 3, we focus on the Two-dimensional Guil- 
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lotine Cutting Problem with Value (GCV) and also a variant of it in which the 
items are allowed to be rotated orthogonally. 

Section 4 is devoted to the Two-dimensional Guillotine Cutting Problem with 
Demands (GCD). We describe two algorithms for it, both based on the column 
generation approach. One of them uses a perturbation strategy we propose to 
deal with the residual instances. We also consider the variant of GCD in which 
orthogonal rotations are allowed. Finally, in Section 5 we report on the com- 
putational results we have obtained with the proposed algorithms. In the last 
section we present some final remarks. 

Owing to space limitations we do not prove some of the claims and we do 
not describe one of the approximation algorithms we have designed. For more 
details on these results we refer to [12]. 

2 Preliminaries 

The Two-dimensional Guillotine Cutting Problem with Value (GCV) is the fol- 
lowing: given a two-dimensional bin (a large rectangle), B = (W, H), with width 
W and height H, and a list of m items (small rectangles), each item i with width 
Wi, height hi, and value Vi (i = 1, . . . ,m), determine how to cut the bin, using 
only guillotine cuts, so as to maximize the sum of the value of the items that 
are produced. We assume that many copies of the same item can be produced. 

The Two-dimensional Guillotine Cutting Problem with Demands (GCD) is 
defined as follows. Given an unlimited quantity of two-dimensional bins B = 
{W,H), with width W and height H, and a list of m items (small rectangles) 
each item i with dimensions {wi, hi) and demand di {i = 1, , m), determine 
how to produce di unities of each item i, using the smallest number of bins B. 

In both problems GCV and GCD we assume that the items are oriented 
(that is, rotations of the items are not allowed); moreover, Wi < W, hi < H for 
i = 1, . . . , TO. The variants of these problems in which the items may be rotated 
orthogonally are denoted by GCV’’ and GCD'’. 

Our main interest in the problem GCV lies in its use as a routine in the col- 
umn generation based algorithm for the problem GCD. While the first problem 
was first investigated in the sixties [18], we did not find in the literature results 
on the problem GCD. We observe that any instance of GCD can be reduced to 
an instance of the two-dimensional cutting stock problem (without demands), 
by substituting each item i for di copies of this item; but this reduction is not 
appropriate as the size of the new instance may become exponential in to. 

We call each possible way of cutting a bin a cutting pattern (or simply pat- 
tern). To represent the patterns (and the cuts to be performed) we use the 
following convention. We consider the Euclidean plane with the xy coor- 
dinate system, and assume that the width of a rectangle is represented in the 
x-axis, and the height is represented in the y-axis. We also assume that the po- 
sition (0,0) of this coordinate system represents the bottom left corner of the 
bin. Thus a bin of width W and height H corresponds to the region defined by 
the rectangle whose bottom left corner is at the position (0, 0) and the top right 
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corner is at the position {W, H). To specify the position of an item i in the bin, 
we specify the coordinates of its bottom left corner. Using these conventions, it is 
not difficult to define more formally what is a pattern and how we can represent 
one. 

A guillotine pattern is a pattern that can be obtained by a sequence of guil- 
lotine cuts applied to the original bin and to the subsequent small rectangles 
that are obtained after each cut (see Figure 1). 





4 5 6 

Fig. 1. (a) Non-guillotine pattern; (b) Guillotine pattern. 



3 The Problem GCV 

In this section we focus on the Two-dimensional Guillotine Cutting Problem with 
Value (GCV). We present first some concepts and results needed to describe the 
algorithm. 

Let I = (IF, H, w, h, v), with w = {wi , . . . , Wm), h = {hi , . . . , hm) and v = 
(ui, . . . ,Vm), be an instance of the problem GCV. We consider that W, H, and 
the entries of w and h are all integer numbers. If this is not the case, we can 
obtain an equivalent integral instance simply by multiplying the widths and/or 
the heights of the bin and of the items by appropriate numbers. 

The first dynamic programming based algorithm for GCV was proposed by 
Gilmore and Gomory [18]. (It did not solve GCV in its general form.) We use here 
a dynamic programming approach for GCV that was proposed by Beasley [4] 
combined with the concept of discretization points defined by Herz [19]. A dis- 
cretization point of the width (respectively of the height) is a value i < W 
(respectively j < H) that can be obtained by an integer conic combination of 
wi,. . . , Wm (respectively hi, . . . , hm)- We denote by P (respectively Q) the set 
of all discretization points of the width (respectively height). 

Following Herz, we say a canonical pattern is a pattern for which all cuts are 
made at discretization points {e.g., the pattern indicated in Figure 1(b)). 

It is immediate that it suffices to consider only canonical patterns (for every 
pattern that is not canonical there is an equivalent one that is canonical) . To refer 
to them, the following functions will be useful. For a rational x < IF, let p(x) := 
max {i\i € P, i < x) and for a rational y < H, let q{y) := max (j \ j € Q, j < y). 
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Using these functions, it is not difficult to verify that the recurrence formula 
below, proposed by Beasley [4], can be used to calculate the value V{w, h) of an 
optimal canonical guillotine pattern of a rectangle of dimensions (w,h). In this 
formula, v{w, h) denotes the value of the most valuable item that can be cut in 
a rectangle of dimensions (w,h), or 0 if no item can be cut in the rectangle. 



V{w, h) = max {y{w, h), {V{w', h) + V {p{w — w'), h) \ w' G P}, 

{V{w, h') + U(u>, q{h - h')) I h' G Q}). (1) 

Thus, if we calculate V{W,H) we have the value of an optimal solution for 
an instance / = {W, H, w, h, ?;). 

We can find the discretization points of the width (or height) by means of 
explicit enumeration, as we show in the algorithm DEE (Discretization by Ex- 
plicit Enumeration) described below. In this algorithm, D represents the width 
(or height) of the bin and d\, . . . , dm represent the widths (or heights) of the 
items. The algorithm DEE can be implemented to run in 0{mS) time, where S 
represents the number of integer conic combinations of di , . . . , dm with value at 
most D. This means that when we multiply D, di, . . . , dm by a constant, the 
time required by DEE is not affected. 

It is easy to construct instances such that 5 > 2'^. Thus, an explicit enu- 
meration may take exponential time. But if we can guarantee that di > ^ 
(i = 1, . . . ,m), the sum of the m coefficients of any integer conic combination 
of di, . . . , dm with value at most D is not greater than k. Thus, 5 is at most the 
number of ^-combinations of m objects with repetition. Therefore, for fixed k, 
6 is polynomial in m and consequently the algorithm DEE is polynomial in m. 



Algorithm 3.1 DEE 

Input: D (width or height), di, . . . , dm- 

Output: a set P of discretization points (of the width or height). 

P = 0, fc = 0. 

While fc > 0 do 

For i = fc -I- 1 to m do = [{D — X)*=i djZj)/di\. 
r = ru{j:ji,zjdj}. 
k = max {{i \ Zi > 0, 1 < i < m} U { — !}). 

If fc > 0 then Zk = Zk — I and P — P U Zjdj}. 

Return P. 



We can also use dynamic programming to find the discretization points. The 
basic idea is to solve a knapsack problem in which every item i has weight and 
value di (z = 1,... ,m), and the knapsack has capacity D. The well-known 
dynamic programming technique for the knapsack problem (see [13]) gives the 
optimal value of knapsacks with (integer) capacities taking values from 1 to D. 
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It is easy to see that j is a discretization point if and only if the knapsack 
with capacity j has optimal value j. We have then an algorithm, which we call 
DDP (Discretization using Dynamical Programming), described in the sequel. 



Algorithm 3.2 DDP 

Input: D, di, . . . , dm- 

Output: a set V of discretization points. 

P = {0}. 

For j = 0 to D do Cj = 0. 

For i = 1 to m do 
For j = di to D 

If Cj < Cj-di + di then Cj = Cj-dt + di 
For ji = 1 to D 

If Cj = j then V — V U {j}. 

Return P. 



We note that the algorithm DDP requires time 0{mD). Thus, the scaling (if 
needed) to obtain an integral instance may render the use of DDP unsuitable 
in practice. On the other hand, the algorithm DDP is suited for instances in 
which D is small. If D is large but the dimensions of the items are not so small 
compared to the dimension of the bin, the algorithm DDP has a satisfactory 
performance. In the computational tests, presented in Section 5, we used the 
algorithm DDP. 

We describe now the algorithm W that solves the recurrence formula (1). 
We have designed this algorithm in such a way that a pattern corresponding 
to an optimal solution can be easily obtained. For that, the algorithm stores 
in a matrix, for every rectangle of width w' G P and height h' G Q, which is 
the direction and the position of the first guillotine cut that has to be made in 
this rectangle. In case no cut should made in the rectangle, the algorithm stores 
which is the item that corresponds to this rectangle. 

When the algorithm T>V halts, for each rectangle with dimensions {pi, qj), 
we have that V{i,j) contains the optimal value that can be obtained in this 
rectangle, guillotine{i, j) indicates the direction of the first guillotine cut, and 
positional, j) is the position (in the x-axis or in the y-axis) where the first guil- 
lotine cut has to be made. If guillotine{i, j) = nil, then no cut has to be made 
in this rectangle. In this case, item(i,j) (if nonzero) indicates which item corre- 
sponds to this rectangle. The value of the optimal solution will be in V{r,s). 

Note that each attribution of value to the variable t can be done in 0(logr-|- 
log s) time by using binary search in the set of the discretization points. If we use 
the algorithm DEE to calculate the discretization points, the algorithm T>V can 
be implemented to have time complexity 0 {6i -I- ^2 + log r -|- log s) , where 
and 62 represent the number of integer conic combinations that produce the 
discretization points of the width and of the height, respectively. 





180 G. Cintra and Y. Wakabayashi 



Algorithm 3.3 W 

Input: An instance 7 = (W, H, w, h, v) of GCV, where w = (wi, . . . , Wm), 
h = (hi, . . . , hm) and v = (wi, . . . , Vm)- 
Output: An optimal solution for I. 

Find Pi < . . . < pr, the discretization points of the width W . 

Find qi < . . . < qs, the discretization points of the height 77. 

For i = 1 to r 
For j = 1 to s 

V{i,j) = max({ufc | 1 < k < m, Wk < Pi and hk < qj} U {0}). 

item{i,j) = max({7 | 1 < fc < m, Wk < Pi, hk < qj and Vk = U {0}). 

guillotine{i,j) = nil. 

For i = 2 to r 
For j = 2 to s 

n — max {k \ 1 < k < r and pk < L^J )■ 

For X = 2 to n 

t — max (fc I 1 <k <r and pk <Pi — Px). 

ItV{i,j) <V{x,j) +V (t, j) then 

= V(x,j) + V{t,j), position{i,j) = p^ and guillotine{i,j) = ’V’. 
n = max {k \ 1 < k < s and qk < )• 

For y — 2 to n 

t — max (fc I 1 < 7 < s and qk < qj — qy). 

It V{i,j) < V{i,y) + V{i,t) then 

V(i,j) = V{i,y) + V{i,t), position{i, j) = qy and guillotine{i, j) = ’H’. 



For the instances of GCV with Wi > ^ and hi > ^ {k fixed and i = 
1, . . . , m), we have that 5i, <52, r and s are polynomial in m. For such instances 
the algorithm T>V requires time polynomial in m. 

We can use a vector X (resp. Y), of size W (resp. 77), and let Xi (resp. 
Yj) contain p{Xi) (resp. q{Yj)). Once the discretization points are calculated, it 
requires time 0{W + 77) to determine the values in the vectors X and Y. Using 
these vectors, each attribution to the variable t can be done in constant time. In 
this case, an implementation of the algorithm VP, using DEE (resp. DDF) as a 
subroutine, would have time complexity 0{6i + S 2 + W + H + + rs^) (resp. 

0{mW + mH + r^s + rs^)). In any case, the amount of memory required by the 
algorithm W is 0{rs). 

We can use the algorithm W to solve the variant of GCV, denoted by 
GCV’’, in which orthogonal rotations of the items are allowed. For that, given 
an instance 7 of GCV’’, we construct another instance (for GCV) as follows. For 
each item i in 7, of width wi, height hi and value Vi, we add another item of 
width hi, height Wi and value Vi, whenever Wi ^ hi, Wi < H and hi <W. 
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4 The Problem GCD and the Column Generation 
Method 

We focus now on the Two-dimensional Guillotine Cut Problem with Demands 
(GCD). First, let us formulate GCD as an ILP (Integer Linear Program). 

Let I = (W,H,w,h,d) be an instance of GCD. Represent each pattern j of 
the instance / as a vector pj , whose i-th entry indicates the number of times item 
i occurs in this pattern. The problem GCD consists then in deciding how many 
times each pattern has to be used to meet the demands and minimize the total 
number of bins that are used. Let n be the number of all possible patterns for I, 
and let P denote an m x n matrix whose columns are the patterns Pi, ■ ■ ■ ,Pn- 
If we denote by d the vector of the demands, then the following is an ILP 
formulation for GCD: minimize subject to Px = d and xj > 0 and Xj 

integer for j = 1, . . . , n. (The variable Xj indicates how many times the pattern 
j is selected.) 

Gilmore and Gomory [17] proposed a column generation method to solve 
the relaxation of the above ILP, shown below. The idea is to start with a few 
columns and then generate new columns of P, only when they are needed. 



minimize xi + . . . + a;„ 

subject to Px = d (2) 

Xj >0 j = 1, . . . ,n. 

The algorithm DV given in Section 3 finds guillotine patterns. Moreover, if 
each item i has value yi and occurs Zi times in a pattern produced by W, then 
maximum. This is exactly what we need to generate new columns. 
We describe below the algorithm SimplexCG 2 that solves (2). 

The computational tests indicated that on the average the number of columns 
generated by SimplexCG 2 was This is in accordance with the theoretical 

results that are known with respect to the average behavior of the Simplex 
method [1,7]. 

We describe now a procedure to find an optimal integer solution from the 
solutions obtained by SimplexCG 2 . The procedure is iterative. Each iteration 
starts with an instance I of GGD and consists basically in solving (2) with 
SimplexGG 2 obtaining B and x. If x is integral, we return B and x and halt. 
Otherwise, we calculate x* = {x\,... ,x’^), where x* = \xi\ {i = 1,... ,m). 
For this new solution, possibly part of the demand of the items is not fulfilled. 
More precisely, the demand of each item i that is not fulfilled is d* = di — 
Y^]jLiBijx*. Thus, if we take d* = (d*,... jd^), we have a residual instance 
I* = (p/ H, I, c, d*) (we may eliminate from I* the items with no demand). 

If some X* > 0 (f = 1, . . . , m), part of the demand is fulfilled by the solution 
X*. In this case, we return B and x, we let 1 = 1* and start a new iteration. If 
X* = 0 (f = I, . . . , m), no part of the demand is fulfilled by x* . We solve then 
the instance I* with the algorithm Hybrid First Fit (HFF) [10]. We present in 
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Algorithm 4.1 SimplexCG 2 

Input: An instance I — {W, H, w, h, d) of GCD, where w = (wi, . . . , Wm), 
h = (hi, . . . , hm) and d = (di, . . . , dm). 

Output: An optimal solution for (2), where the columns of P are the patterns for I. 

1 Let X = d and B be the identity matrix of order m. 

2 Solve y'^B = 

3 Generate a new column 2 executing the algorithm T>P with parameters W, H, I, a, y. 

4 If y'^ z < 1, return B and x and halt {x corresponds to the columns of B). 

5 Otherwise, solve Bw = z. 

6 Let t = min{^ \ 1 < j < m, wj > 0). 

7 Let s = min{j \ 1 < j < m, ^ = t)- 

8 For i = 1 to m do 

8.1 Bi^s Zi. 

8.2 If i = s then Xi = t; otherwise, Xi = Xi — Wit. 

9 Go to step 2. 



what follows the algorithm CQ that implements the iterative procedure we have 
described. 

Note that in each iteration either part of the demand is fulfilled or we go to 
step 4. Thus, after a finite number of iterations the demand will be met (part 
of it eventually in step 4). In fact, one can show that step 3.6 of the algorithm 
CG is executed at most m times. It should be noted however, that step 5 of the 
algorithm CG may require exponential time. This step is necessary to transform 
the representation of the last residual instance in an input for the algorithm 
HFF, called in the next step. Moreover, HFF may also take exponential time to 
solve this last instance. 

We designed an approximation algorithm for GCD, called SH, that makes 
use of the concept of semi-homogeneous patterns and has absolute performance 
bound 4 (see [12]). The reasons for not using STL, instead of HFF, to solve the 
last residual instance are the following: first, to generate a new column with the 
algorithm BV requires time that can be exponential in m. Thus, CG is already 
exponential, even on the average case. Besides that, the algorithm HFF has 
asymptotic approximation bound 2.125. Thus, we may expect that using HFF 
we may produce solutions of better quality. 

On the other hand, if the items are not too small with respect to the bin^, the 
algorithm BV can be implemented to require polynomial time (as we mentioned 
in Section 3). In this case, we could eliminate steps 4 and 5 of CG and use STL 
instead of HFF to solve the last residual instance. The solutions may have worse 
quality, but at least, theoretically, the time required by such an algorithm would 
be polynomial in m, on the average. 

We note that the algorithm CG can be used to solve the variant of GCD, 
called GCD’’, in which orthogonal rotations of the items are allowed. For that, 

^ More precisely, for fixed k, Wi > ^ and hi > LI [i = 1, . . . , m). 





Dynamic Programming and Column Generation Based Approaches 183 



Algorithm 4.2 CQ 

Input: An instance I — (W, H, w, h, d) of GCD, where w = (wi, . . . , Wm), 
h = (hi, . . . , hm) and d = (di, . . . , dm). 

Output: A solution for I 

1 Execute the algorithm SimplexCG 2 with parameters W, H, I, a, d obtaining B and x. 

2 For i = 1 to m do a:* = [xi\. 

3 If X* > 0 for some i, 1 < i < m, then 

3.1 Return B and . . . ,Xm (but do not halt). 

3.2 For i = 1 to m do 

3.2.1 For j = 1 to m do di = di — Bijx’j. 

3.3 Let m' = 0. 

3.4 For i = 1 to m do 

3.4.1 If di > 0 then m! — m' -\- 1, Wm' = Wi, hm' = hi and dm' = di. 

3.5 If m' = 0 then halt. 

3.6 Let m = m! and go to step 1. 

4 w' = 0, h' = 0. 

5 For i = 1 to m do 

5.1 For j = 1 to di do 

5.1.1 w' = w' U {wi}, h' — h' yj {hi}. 

6 Return the solution of algorithm HFF executed with parameters W, H, w',h'. 



before we call the algorithm W, in step 3 of SimplexCG 2 , it suffices to make 
the transformation explained at the end of Section 3. We will call SimplexCG^ 
the variant of SimplexGG 2 with this transformation. It should be noted however 
that the algorithm HFF, called in step 6 of CQ, does not use the fact that the 
items can be rotated. 

We designed a simple algorithm for the variant of GGD’’ in which all items 
have demand 1, called here GGDB’’. This algorithm, called First Fit Decreasing 
Height using Rotations (FFDHR), has asymptotic approximation bound at most 
4, as we have shown in [12]. Substituting the call to HFF with a call to FFDHR, 
we obtain the algorithm CQTZ, that is a specialized version of CQ for the problem 
GGDG 

We also tested another modification of the algorithm CQ (and of CQTZ). 
This is the following: when we solve an instance, and the solution returned 
by SimplexGG 2 rounded down is equal to zero, instead of submitting this in- 
stance to HFF (or FFDHR), we use HFF (or FFDHR) to obtain a good pattern, 
updating the demands, and if there is some item for which the demand is not 
fulfilled, we go to step 1. 

Note that, the basic idea is to perturb the residual instances whose relaxed LP 
solution, rounded down, is equal to zero. With this procedure, it is expected that 
the solution obtained by SimplexGG 2 for the residual instance has more variables 
with value greater than 1. The algorithm CQ^ , described below, incorporates this 
modification. 

It should be noted that with this modification we cannot guarantee anymore 
that we have to make at most m -I- 1 calls to SimplexGG 2 . It is however, easy 
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Algorithm 4.3 GC^ 

Input: An instance 7 = (W, H, w, h, d) of GCD, where w = (wi , . . . , Wm), 

h = {hi,. . . , hm) and d = (di, . . . , dm). 

Output: A solution for I 

1 Execute the algorithm SimplexGG 2 with parameters W, 77, w, h, d obtaining B and 

X. 

2 For i = 1 to m do a:| = [xi\. 

3 If X* > 0 for some i, 1 < i < m, then 

3.1 Return B and x\, . . . ,Xm (but do not halt). 

3.2 For i = 1 to m do 

3.2.1 For j = 1 to m do di — di — BijXj. 

3.3 Let m! = 0. 

3.4 For i = 1 to m do 

3.4.1 If di > 0 then m' = m' + 1, Wm' = Wi, hm' = hi and dm' = di. 

3.5 If m' = 0 then halt. 

3.6 Let m = m' and go to step 1. 

4 w' — tit, h' — 0. 

5 For i = 1 to m do 

5.1 For j = 1 to di do 

5.1.1 w' = w' yj {wi}, h' = h' U {hi}. 

6 Return a pattern generated by the algorithm HFF, executed with parameters 
W, 77, w',h', that has the smallest waste of area, and update the demands. 

7 If there are demands to be fulfilled, go to step 1. 



to see that the algorithm CG^ in fact halts, as each time step 1 is executed, the 
demand decreases strictly. After a finite number of iterations the demand will 
be fulfilled and the algorithm halts (in step 3.5 or step 7). 

5 Computational Results 

The algorithms described in sections 3 and 4 were implemented in C language, 
using Xpress-MP [27] as the LP solver. The tests were run on a computer with 
two processors AMD Athlon MP 1800+, clock of 1.5 Ghz, memory of 3.5 GB 
and operating system Linux (distribution Debian GNU/Linux 3.0). 

The performance of the algorithm W was tested with the instances of GGV 
available in the OR-LIBRARY^ (see Beasley [6] for a brief description of this 
library). We considered the 13 instances of GGV, called gcutl, . . . ,gcw713 avail- 
able in this library. For all these instances, with exception of instance gcutlS, 
optimal solutions had already been found [4]. We solved to optimality this in- 
stance as well. In these instances the bins are squares, with dimensions between 
250 and 3000, and number of items (m) between 10 and 50. The value of each 
item is precisely its area. We show in Figure 2 the optimal solution for gcutlS 
found by the algorithm DV. 

^ http : //mscmga.ms . ic . ac .uk/ info .html 
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Fig. 2. The optimal solution for gcutlS found by the algorithm DV. 

In Table 1 we show the characteristics of the instances solved and the com- 
putational results. The column “Waste” gives — for each solution found — the 
percentage of the area of the bin that does not correspond to any item. Each 
instance was solved 100 times; the column “Time” shows the average CPU time 
in seconds, considering all these 100 resolutions. 

We have also tested the algorithm W for the instances gcutl, . . . ,gcutl3, 
this time allowing rotations (we called these instances gcutlr , . . . , gcutlSr). Ow- 
ing to space limitations, we omit the table showing the computational results. 
It can be found at http : //www. ime .usp .br/$\sim$glauber/gcut. We only re- 
mark that for some instances the time increased (it did not doubled) but the 
waste decreased, as one would expect. 

We did not find instances for GCD in the OR-LIBRARY. We have then tested 
the algorithms CQ and CQ^ with the instances gcutl , . . . , gcutl2, associating with 
each item i a randomly generated demand di between 1 and 100. We called these 
instances gcutld, . . . ,gcutl2d. 

We show in Table 2 the computational results obtained with the algorithm 
CG- In this table, LB denotes the lower bound (given by the solution of (2)) 
for the value of an optimal integer solution. Each instance was solved 10 times; 
the column “Average Time” shows the average time considering these 10 exper- 
iments. 

The algorithm CQ found optimal or quasi-optimal solutions for all these in- 
stances. On the average, the difference between the solution found by CG and 
the lower bound (LB) was only 0,161%. We note also that the time spent to 
solve these instances was satisfactory. Moreover, the gain of the solution of CG 
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Table 1. Performance of the algorithm 'DV. 



Instance 


Quantity 
of items 


Dimensions 
of the bin 


r 


s 


Optimal 

Solution 


Waste 


Time 

(sec) 


gcutl 


10 


(250, 250) 


19 


68 


56460 


9,664% 


0,003 


gcut2 


20 


(250, 250) 


112 


95 


60536 


3,142% 


0,010 


gcutS 


30 


(250, 250) 


107 


143 


61036 


2,342% 


0,012 


gcutA 


50 


(250, 250) 


146 


146 


61698 


1,283% 


0,022 


gcut5 


10 


(500, 500) 


76 


39 


246000 


1,600% 


0,004 


gcutd 


20 


(500, 500) 


120 


95 


238998 


4,401% 


0,008 


gcutl 


30 


(500, 500) 


126 


179 


242567 


2,973% 


0,017 


gcutS 


50 


(500, 500) 


262 


225 


246633 


1,347% 


0,062 


gcut9 


10 


(1000, 1000) 


41 


91 


971100 


2,890% 


0,006 


gcutlO 


20 


(1000, 1000) 


155 


89 


982025 


1,798% 


0,009 


gcutll 


30 


(1000, 1000) 


326 


238 


980096 


1,990% 


0,066 


gcutl2 


50 


(1000, 1000) 


363 


398 


979986 


2,001% 


0,140 


gcutlS 


32 


(3000, 3000) 


2425 


1891 


8997780 


0,025% 


144,915 



compared to the solution of HFF was 8,779%, on the average, a very significant 
improvement. 

We have also used the algorithm CQ^ to solve the instances gcutld, . . . , 
gcutl2d. The results are shown in Table 3. We note that the number of columns 
generated increased approximately 40%, on the average, and the time spent 
increased approximately 15%, on the average. On the other hand, an optimal 
solution was found for the instance gcutAd. 

We also considered the instances gcutld, . . . ,gcutl2d as being for the prob- 
lem GCD’’ (rotations are allowed), and called them gcutldr, . . . ,gcutl2dr. We 
omit the table with the computational results we have obtained (the reader can 
find it at the URL we mentioned before). We only remark that the algorithm 
CQTZ found optimal or quasi-optimal solutions for all instances. The difference 
between the value found by CQTZ and the lower bound (LB) was only 0,408%, 
on the average. 

Comparing the value of the solutions obtained by GQIZ with the solutions 
obtained by FFDHR, we note that there was an improvement of 12,147%, on the 
average. This improvement would be of 16,168% if compared with the solution 
obtained by HFF. 

The instances gcutldr , . . . , gcutl2dr were also tested with the algorithm 
CQTV . The computational results are shown in Table 4. We remark that the 
performance of the algorithm CQTiF was a little better than the performance 
of CQTZ, with respect to the quality of the solutions. The difference between 
the value of the solution obtained by CQTV’ and the lower bound decreased to 
0,237%, on the average. The gain on the quality was obtained at the cost of 
an increase of approximately 97% (on the average) of the number of columns 
generated and of an increase of approximately 44% of time. 





Dynamic Programming and Column Generation Based Approaches 187 



Table 2. Performance of the algorithm CQ. 



Instance 


Solution 

of eg 


LB 


Difference 
from LB 


Average 
Time (sec) 


Columns 

Generated 


Solution 
of HFF 


Improvement 
over HFF 


gcutld 


294 


294 


0,000% 


0,059 


9 


295 


0,339% 


gcut2d 


345 


345 


0,000% 


0,585 


68 


402 


14,179% 


gcutSd 


333 


332 


0,301% 


2,340 


274 


393 


14,834% 


gcutAd 


838 


836 


0,239% 


11,693 


820 


977 


11,323% 


gcut5d 


198 


197 


0,507% 


0,088 


18 


198 


0,000% 


gcutGd 


344 


343 


0,291% 


0,362 


101 


418 


17,308% 


gcutld 


591 


591 


0,000% 


1,184 


136 


615 


4,523% 


gcutSd 


691 


690 


0,145% 


30,361 


952 


764 


9,555% 


gcufdd 


131 


131 


0,000% 


0,068 


11 


143 


7,092% 


gcutlQd 


293 


293 


0,000% 


0,172 


20 


335 


12,537% 


gcutlld 


331 


330 


0,303% 


8,570 


222 


353 


6,232% 


gcutl2d 


673 


672 


0,149% 


39,032 


485 


727 


7,428% 



6 Concluding Remarks 

In this paper we presented algorithms for the problems GCV and GCD and 
their variants GGV’’ and GGD’’. For the problem GGV we presented the (exact) 
pseudo-polynomial algorithm DV . This algorithm can either use the algorithm 
DDE or DDP to generate the discretization points. Both of these algorithms 
were described. We have also shown that these algorithms can be implemented 
to run in polynomial time when the items are not so small compared to the size 
of the bin. In this case the algorithm T^V also runs in polynomial time. We have 
also mentioned how to use W to solve the problem GGV’’. 

We presented two column generation based algorithms to solve GGD: CQ and 
CQ"‘‘ . Both use the algorithm T>V to generate the columns: the first uses the algo- 
rithm HFF to solve the last residual instance and the second uses a perturbation 
strategy. The algorithm CQ combines different techniques: Simplex method with 
column generation, an exact algorithm for the discretization points, and an ap- 
proximation algorithm (HFF) for the last residual instance. An approach of this 
nature has shown to be promising, and has been used in the one-dimensional 
cutting problem with demands [26,11]. 

The algorithm CQ^ is a variant of CQ, in which we use an idea that consists 
in perturbing the residual instances. We have also designed the algorithms CQTC 
and CQVf , for the problem GGD’’, a variation of GGD in which orthogonal 
rotations are allowed. The algorithm CQTZ uses as a subroutine the algorithm 
FFDHR, we have designed. 

We noted that the algorithm CQ and CQTZ are polynomial, on the average, 
when the items are not so small compared to the size of the bin. The compu- 
tational results with these algorithms were very satisfactory: optimal or quasi- 
optimal solutions were found for the instances we have considered. As expected, 
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Table 3. Performance of the algorithm CQ^ . 



Instance 


Solution 
of cg^ 


LB 


Difference 
from LB 


Average 
Time (sec) 


Columns 

Generated 


Solution 
of HFF 


Improvement 
over HFF 


gcutld 


294 


294 


0,000% 


0,034 


9 


295 


0,339% 


gcut2d 


345 


345 


0,000% 


0,552 


68 


402 


14,179% 


gcutSd 


333 


332 


0,301% 


3,814 


492 


393 


14,834% 


gcutAd 


837 


836 


0,120% 


16,691 


1271 


977 


11,429% 


gcut5d 


198 


197 


0,507% 


0,086 


25 


198 


0,000% 


gcutGd 


344 


343 


0,291% 


0,400 


121 


418 


17,308% 


gcutld 


591 


591 


0,000% 


1,202 


136 


615 


4,523% 


gcutSd 


691 


690 


0,145% 


32,757 


1106 


764 


9,555% 


gcut9d 


131 


131 


0,000% 


0,042 


11 


143 


7,092% 


gcutlOd 


293 


293 


0,000% 


0,153 


20 


335 


12,537% 


gcutlld 


331 


330 


0,303% 


10,875 


416 


353 


6,232% 


gcutl2d 


673 


672 


0,149% 


42,616 


692 


727 


7,428% 



CQ^ (respectively CQ'RJ’) found solutions of a little better quality than CQ (re- 
spectively CQTZ) at the cost of a slight increase in the running time. 

We exhibit in Table 5 the list of the algorithms we proposed in this paper. 

A natural development of our work would be to adapt the approach used in 
the algorithm CG to the Two-dimensional cutting stock problem with demands 
(CSD), a variant of GCD in which the cuts need not be guillotine. One can find 
an initial solution using homogeneous patterns; the columns can be generated 
using any of the algorithms that have appeared in the literature for the two- 
dimensional cutting stock problem with value [5,2]. To solve the last residual 
instance one can use approximation algorithms [10,8,20]. 

One can also use column generation for the variant of CSD in which the 
quantity of items in each bin is bounded. This variant, proposed by Christofides 
and Whitlock [9], is called restricted two-dimensional cutting stock problem. Each 
new column can be generated with any of the known algorithms for the restricted 
two-dimensional cutting stock problem with value [9,24], and the last residual 
instance can be solved with the algorithm HFF. This restricted version with guil- 
lotine cut requirement can also be solved using the ideas we have just described: 
the homogeneous patterns and the patterns produced by HFF can be obtained 
with guillotine cuts, and the columns can be generated with the algorithm of 
Cung, Hifi and Le Cun [16]. 

A more audacious step would be to adapt the column generation approach 
for the three-dimensional cutting stock problem with demands. For the initial so- 
lutions one can use homogeneous patterns. The last residual instances can be 
dealt with some approximation algorithms for the three-dimensional bin pack- 
ing problem [3,14,15,22,23,21,25]. We do no know however, exact algorithms to 
generate columns. If we require the cuts to be guillotine, one can adapt the 
algorithm T>V to generate new columns. 
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Table 4. Performance of the algorithm CQTZ^ . 



Instance 


Solution 

of cgjz^ 


LB 


Difference 
from LB 


Average 
Time (sec) 


Columns 

Generated 


Solution 
of FFDHR 


Improvement 
over FFDHR 


gcutldr 


291 


291 


0,000% 


0,070 


5 


291 


0,000% 


gcut2dr 


283 


282 


0,355% 


5,041 


252 


330 


14,242% 


gcutSdr 


314 


313 


0,319% 


10,006 


740 


355 


11,549% 


gcutAdr 


837 


836 


0,120% 


25,042 


1232 


945 


11,429% 


gcut5dr 


175 


174 


0,575% 


0,537 


58 


200 


12,500% 


gcutGdr 


301 


301 


0,000% 


2,884 


175 


405 


25,679% 


gcutldr 


542 


542 


0,000% 


4,098 


121 


599 


9,516% 


gcutSdr 


651 


650 


0,153% 


68,410 


1154 


735 


11,429% 


gcufddr 


123 


122 


0,820% 


0,494 


42 


140 


12,143% 


gcutlOdr 


270 


270 


0,000% 


1,546 


33 


330 


18,182% 


gcutlldr 


299 


298 


0,336% 


86,285 


686 


329 


9,119% 


gcutl2dr 


602 


601 


0,166% 


181,056 


945 


682 


11,730% 



Table 5. Algorithms proposed in this paper. 



Algorithm 


Problems 


Comments 


W 


GCV and GCV 


Polynomial for large items 


DEE 


Discretization points 


Polynomial for large items 


DDP 


Discretization points 




eg 


GGD 


Polynomial, on the average, for large items 


cgp 


GGD 




cgn 


GGD"’ 


Polynomial, on the average, for large items 


cgjzp 


GGD"’ 
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Abstract. In this paper, we report on our own experience in studying 
a fundamental problem on graphs: all pairs shortest paths. In particu- 
lar, we discuss the interplay between theory and practice in engineering 
a simple variant of Dijkstra’s shortest path algorithm. In this context, 
we show that studying heuristics that are efficient in practice can yield 
interesting clues to the combinatorial properties of the problem, and 
eventually lead to new theoretically efficient algorithms. 

1 Introduction 

The quest for efficient computer programs for solving real world problems has led 
in recent years to a growing interest in experimental studies of algorithms. Pro- 
ducing efficient implementations requires taking into account issues such as mem- 
ory hierarchy effects, hidden constant factors in the performance bounds, impli- 
cations of communication complexity, numerical precision, and use of heuristics, 
which are sometimes overlooked in classical analysis models. On the other hand, 
developing and assessing heuristics and programming techniques for producing 
codes that are efficient in practice is a difficult task that requires a deep un- 
derstanding of the mathematical structure and the combinatorial properties of 
the problem at hand. In this context, experiments can raise new conjectures and 
theoretical questions, opening unexplored research directions that may lead to 
further theoretical improvements and eventually to more practical algorithms. 
The whole process of designing, analyzing, implementing, tuning, debugging and 
experimentally evaluating algorithms is usually referred to as Algorithm Engi- 
neering. As shown in Figure 1, algorithm engineering is a cyclic process: designing 
algorithmic techniques and analyzing their performance according to theoretical 
models provides a sound foundation to writing efficient computer programs. On 

* Work partially supported by the Sixth Framework Programme of the EU under con- 
tract number 507613 (Network of Excellence “EuroNGI: Designing and Engineering 
of the Next Generation Internet”), by the 1ST Programme of the EU under contract 
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the other hand, analyzing the practical performance of a program can be helpful 
in spotting bottlenecks in the code, designing heuristics, refining the theoretical 
analysis, and even devising more realistic cost models in order to get a deeper 
insight into the problem at hand. We refer the interested reader to [4] for a 
broader survey of algorithm engineering issues. 



Algorithm design 



Theoretical analysis 

Deeper 

insights Bottlenecks , 

Heuristics 

Algorithm implementation 



Experimental analysis 



More 

realistic 

models 

Hints to 

refine 

analysis 



Fig. 1. The algorithm engineering cycle. 



In this paper, we report on our own experience in studying a fundamental 
problem on graphs: the all pairs shortest paths problem. We discuss the interplay 
between theory and practice in engineering a variant of Dijkstra’s shortest path 
algorithm. In particular, we present a simple heuristic that can improve substan- 
tially the practical performances of the algorithm on many typical instances. We 
then show that a deeper study of this heuristic can reveal interesting combi- 
natorial properties of paths in a graph. Surprisingly, exploiting such properties 
can lead to new theoretically efficient methods for updating shortest paths in a 
graph subject to dynamic edge weight changes. 

2 Prom Theory to Experiments: Engineering Dijkstra’s 
Algorithm 

In 1959, Edsger Wybe Dijkstra devised a simple algorithm for computing short- 
est paths in a graph [6]. Although more recent advances in data structures led 
to faster implementations (see, e.g., the Fibonacci heaps of Fredman and Tar- 
jan [7]), Dijkstra’s algorithmic invention is still in place after 45 years, providing 
an ubiquitous standard framework for designing efficient shortest path algo- 
rithms. If priority queues with constant amortized time per decrease are used, 
the basic method computes the shortest paths from a given source vertex in 
0{m + nlogn) worst-case time in a graph with n vertices and m edges. To com- 
pute all pairs shortest paths, we get an 0{mn + n^logn) bound in the worst 
case by simply repeating the single-source algorithm from each vertex. 

In Figure 2, we show a simple variant of Dijkstra’s algorithm where all pairs 
shortest paths are computed in an interleaved fashion, rather than one source 
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algorithm Di jkstraAPSP(graph G — (V,E,w)) — >■ matrix 

1. let d be an n X n matrix and let H be an empty priority queue 

2. for each pair (x,y) G V x V do {initialization} 

3. if {x, y) € E then d[x, y] <— w{x, y) else d\x, y] < hoo 

4. add edge {x, y) to H with priority d[x, y] 

5. while H is not empty do (main loop} 

6. extract from H a pair {x, y) with minimum priority 

7. for each edge {y, z) £ E leaving y do (forward extension} 

8. if d[x,y] +w{y^z) < d[x^z] then {relaxation} 

9. d\x,z\-(^ d[x,y]-\-w{y,z) 

10. decrease the priority of (a;, z) in H to d[x, z] 

11. for each edge (z,x) £ E entering x do {backward extension} 

12. if w(z,x) + d[x, y] < d[z,y] then {relaxation} 

13. d[2, y] x) + d[a;, y] 

14. decrease the priority of {z, y) in H to d[z, y\ 

15. return d 

Fig. 2. All pairs variant of Dijkstra’s algorithm. w{x,y) denotes the weight of edge 
{x,y) in G. 



at a time. The algorithm maintains a matrix d that contains at any time an 
upper bound to the distances in the graph. The upper bound d[x, y] for any pair 
of vertices {x, y) is initially equal to the edge weight w{x, y) if there is an edge 
between x and y, and +oo otherwise. The algorithm also maintains in a priority 
queue H each pair {x,y) with priority d[x^y\. The main loop of the algorithm 
repeatedly extracts from H a pair {x, y) with minimum priority, and tries to 
extend at each iteration the corresponding shortest path by exactly one edge in 
every possible direction. This requires scanning all edges leaving y and entering 
X, performing the classical relaxation step to decrease the distance upper bounds 
d. It is not difficult to prove that at the end of the procedure, if edge weights 
are non-negative, d contains the exact distances. The time required for loading 
and unloading the priority queue is 0{n^ log n) in the worst case. Each edge 
is scanned at most 0{n) times and for each scanned edge we spend constant 
amortized time if H is, e.g., a Fibonacci heap. This yields 0{mn + n^logn) 
worst-case time. 

While faster algorithms exist for very sparse graphs [8,9,10], Dijkstra’s al- 
gorithm appears to be still a good practical choice in many real world settings. 
For this reason, the quest for fast implementations has motivated researchers to 
study methods for speeding up Dijkstra’s basic method based on priority queues 
and relaxation. For instance, it is now well understood that efficient data struc- 
tures play a crucial role in the case of sparse graphs, while edge scanning is the 
typical bottleneck in dense graphs [1]. In the rest of this section, we focus on 
the algorithm of Figure 2, and we investigate heuristic methods for reducing the 
number of scanned edges. 

Consider line 7 of Figure 2 (or, similarly, line 11): the loop scans all edges 
(y,z), seeking for tentative shortest paths of the form x y ^ z with weight 
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d[x,y] + w{y,z). Do we really need to scan all edges (y,z) leaving yl More to 
the point, is there any way to avoid scanning an edge (y, z) if it cannot belong 
to a shortest path from x to z? Let TTj;y be a shortest path from x to y and 
let TTxz = T^xy ‘ {v, z) be the path obtained by going from x to y via and 
then from y to z via edge {y,z) (see Figure 3). Consider now the well-known 
optimal-substructure property of shortest paths (see e.g., [2]): 




Lemma 1. Every subpath of a shortest path is a shortest path. 

This property implies that, if tt^z is a shortest path and we remove either one 
of its endpoints, we still get a shortest path. Thus, edge {y, z) can be the last 
edge of a shortest path tt^z only if both T^xy C t^xz and Haz C tTxz are shortest 
paths, where (x, a) is the first edge in 'Kxz- Let us now exploit this property in 
our shortest path algorithm in order to avoid scanning “unnecessary” edges. In 
the following, we assume without loss of generality that shortest paths in the 
graph are unique. We can just maintain for each pair of vertices {x,y) a list of 
edges Rxy = { {y,z) s.t. t^xz = T^xy ■ (y,z) is a shortest path }, and a list of 
edges Lxy = { (z,x) s.t. ir^y = (z,x) ■ iTxy is a shortest path }, where iTxy is the 
shortest path from x to y. Such lists can be easily constructed incrementally as 
the algorithm runs at each pair extraction from the priority queue H . Suppose 
now to modify line 7 and line 11 of Figure 2 to scan just edges in Ray and Lxb^ 
respectively, where (a;, a) is the first edge and {b,y) is the last edge in TTxy It is 
not difficult to see that in this way we consider in the relaxation step only paths 
whose proper subpaths are shortest paths. We call such paths locally shortest [5]: 

Definition 1. A path Hxy is locally shortest in G if either: 

(i) T^xy consists of a single vertex, or 
(ii) every proper subpath of TVxy is a shortest path in G. 

With the modifications above, the algorithm requires 0{\LSP\ + n^ log n) worst- 
case time, where \LSP\ is the number of locally shortest paths in the graph. A 
natural question seems to be: how many locally shortest paths can we have in a 
graph? The following lemma is from [5]: 

Lemma 2. If shortest paths are unique in G, then there can be at most mn 
locally shortest paths in G. This bound is tight. 
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Locally shortest paths in random graphs (500 nodes) 




Fig. 4. Average nnmber of locally shortest paths connecting a pair of vertices in: (a) 
a family of random graphs with increasing density; (b) a suite of US road networks 
obtained from f tp : / /edcf tp . cr . usgs . gov. 



This implies that our modification of Dijkstra’s algorithm does not produce an 
asymptotically faster method. But what about typical instances? In [3], we have 
performed some counting experiments on both random and real-world graphs 
(including road networks and Internet Autonomous Systems subgraphs), and we 
have discovered that in these graphs \LSP\ tends to be surprisingly very close 
to n^. In Figure 4, we show the average number of locally shortest paths con- 
necting a pair of vertices in a family of random graphs with 500 vertices and 
increasing number of edges, and in a suite of US road networks. According to 
these experiments, the computational savings that we might expect using lo- 
cally shortest paths in Dijkstra’s algorithm increase as the edge density grows. 
In Figure 5, we compare the actual running time of a C implementation the 
algorithm given in Figure 2 (S-DIJ), and the same algorithm with the locally 
shortest path heuristic described above (S-LSP). Our implementations are de- 
scribed in [3] and are available at: http ; //www. dis .uniromal . it/~demetres/ 
experim/dsp/. Notice that in a random graph with density 20% S-LSP can be 
16 times faster than S-DIJ. This confirms our expectations based on the count- 
ing results given in Figure 4. On very sparse graphs, however, S-LSP appears to 
be slightly slower than S-DIJ due to the data structure overhead of maintaining 
lists Lxy and Rxy for each pair of vertices x and y. 

3 Back from Engineering to Theory: A New Dynamic 
Algorithm 

Let us now consider a scenario in which the input graph changes over time, 
subject to a sequence of edge weight updates. The goal of a dynamic shortest 
paths algorithm is to update the distances in the graph more efficiently than 
recomputing the whole solution from scratch after each change of the weight of 
an edge. In this section, we show that the idea of using locally shortest paths, 
which appears to be a useful heuristic for computing all pairs shortest paths as 
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Experiment for increasing edge density (rnd, 500 nodes, Intel Xeon) 




Number of edges (x 1000) 



Fig. 5. Comparison of the actual running time of a C implementation of Dijkstra’s 
algorithm with (S-LSP) and without (S-DIJ) the locally shortest paths heuristic in a 
family of random graphs with 500 vertices, increasing edge density, and integer edge 
weights in [1, 1000]. Experiments were performed on an Intel Xeon 500MHz, 512KB L2 
cache, 512MB RAM. 



we have seen in Section 2, can play a crucial role in designing asymptotically 
fast update algorithms for the dynamic version of the problem. Let us consider 
the case where edge weights can only be increased (the case where edge weights 
can only be decreased in analogous). Notice that, after increasing the weight of 
an edge, some of the shortest paths containing it may stop being shortest, while 
other paths may become shortest, replacing the old ones. The goal of a dynamic 
update algorithm is find efficiently such replacement paths. Intuitively, a locally 
shortest path is either shortest itself, or it just falls short of being shortest. 
Locally shortest paths are therefore natural candidates for being replacement 
paths after an edge weight increase. A possible approach could be to keep in a 
data structure all the locally shortest paths of a graph, so that a replacement 
path can be found quickly after an edge update. On the other hand, keeping 
such a data structure up to date should not be too expensive. To understand if 
this is possible at all, we first need to answer the following question: how many 
paths can start being locally shortest and how many paths can stop being locally 
shortest after an edge weight increase? The following theorem from [5] answers 
this question: 

Theorem 1. Let G be a graph subject to a sequence of edge weight increases. If 
shortest paths are unique in G, then during each update: 

(1) 0{n^) paths can stop being locally shortest; 

(2) 0{n^) paths can start being locally shortest, amortized over Q{n) operations. 

According to Theorem 1, we might hope to maintain explicitly the set of 
locally shortest paths in a graph in quadratic amortized time per operation. 
Since shortest paths in a graph are locally shortest, then maintaining such a 
set would allow us to keep information also about shortest paths. As a matter 
of fact, there exists a dynamic variant of the algorithm given in Figure 2 that 
is able to update the locally shortest paths (and thus the shortest paths) of 
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a graph in O(n^logn) amortized time per edge weight increase; details can 
be found in [5]. For graphs with m = l7(nlogn) edges, this is asymptotically 
faster than recomputing the solution from scratch using Dijkstra’s algorithm. 
Furthermore, if the distance matrix has to be maintained explicitly, this is only 
a poly logarithmic factor away from the best possible bound. Surprisingly, no 
previous result was known for this problem until recently despite three decades 
of research in this topic. 

To solve the dynamic all pairs shortest paths problem in its generality, where 
the sequence of updates can contain both increases and decreases, locally short- 
est paths can no longer be used directly: indeed, if increases and decreases can 
be intermixed, we may have worst-case situations with fi(mn) changes in the 
set of locally shortest paths during each update. However, using a generalization 
of locally shortest paths, which encompasses the history of the update sequence 
to cope with pathological instances, we devised a method for updating short- 
est paths in O(n^log^n) amortized time per update [5]. This bound has been 
recently improved to 0(n^ • (logn -I- log^(m/n))) by Thorup [11]. 

4 Conclusions 

In this paper, we have discussed our own experience in engineering different 
all pairs shortest paths algorithms based on Dijkstra’s method. The interplay 
between theory and practice yielded significant results. We have shown that 
the novel notion of locally shortest paths, which allowed us to design a useful 
heuristic for improving the practical performances of Dijkstra’s algorithm on 
dense graphs, led in turn to the first general efficient dynamic algorithms for 
maintaining the all pairs shortest paths in a graph. 

Despite decades of research, many aspects of the shortest paths problem are 
still far from being fully understood. For instance, can we compute all pairs 
shortest paths in o{mn) in dense graphs? As another interesting open problem, 
can we update a shortest paths tree asymptotically faster than recomputing it 
from scratch after an edge weight change? 

References 

1. B.V. Cherkassky, A.V. Goldberg, and T. Radzik. Shortest paths algorithms: The- 
ory and experimental evaluation. Mathematical Programming, 73:129-174, 1996. 

2. T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algo- 
rithms. McGraw-Hill, 2001. 

3. C. Demetrescu, S. Emiliozzi, and G.F. Italiano. Experimental analysis of dynamic 
all pairs shortest path algorithms. In Proceedings of the 15th Annual ACM-SIAM 
Symposium on Discrete Algorithms (SODA’Of), 2004. 

4. C. Demetrescu, I. Finocchi, and G.F. Italiano. Algorithm engineering. The Algo- 
rithmics Column (J. Diaz), Bulletin of the EATCS, 79:48-63, 2003. 

5. C. Demetrescn and G.F. Italiano. A new approach to dynamic all pairs shortest 
paths. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing 
(STOC’03), San Diego, CA, pages 159-166, 2003. 




198 



C. Demetrescu and G.F. Italiano 



6. E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische 
Mathematik, 1:269-271, 1959. 

7. M.L. Fredman and R.E. Tarjan. Fibonacci heaps and their use in improved network 
optimization algorithms. Journal of the ACM, 34:596-615, 1987. 

8. S. Pettie. A faster all-pairs shortest path algorithm for real-weighted sparse graphs. 
In Proceedings of 29th International Colloquium on Automata, Languages, and 
Programming (ICALP’02), LNCS Vol. 2380, pages 85-97, 2002. 

9. S. Pettie and V. Ramachandran. Computing shortest paths with comparisons and 
additions. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete 
Algorithms (SODA’02), pages 267-276. SIAM, January 6-8 2002. 

10. S. Pettie, V. Ramachandran, and S. Sridhar. Experimental evaluation of a new 
shortest path algorithm. In fth Workshop on Algorithm Engineering and Experi- 
ments (ALENEX’02), LNCS Vol. 2409, pages 126-142, 2002. 

11. M. Thorup. Tighter fully-dynamic all pairs shortest paths, 2003. Unpublished 
manuscript. 




How to Tell a Good Neighborhood from a Bad 
One: Satisfiability of Boolean Formulas 



Tassos Dimitriou^ and Paul Spirakis^ 

^ Athens Information Technology, Greece. 
tassos@ait . gr 

^ Computer Technology Institute, Greece. 
spirakisOcti . gr 



Abstract. One of the major problems algorithm designers usually face 
is to know in advance whether a proposed optimization algorithm is 
going to behave as planned, and if not, what changes are to be made 
to the way new solutions are examined so that the algorithm performs 
nicely. In this work we develop a methodology for differentiating good 
neighborhoods from bad ones. As a case study we consider the structure 
of the space of assignments for random 3-SAT formulas and we compare 
two neighborhoods, a simple and a more refined one that we already know 
the corresponding algorithm behaves extremely well. We give evidence 
that it is possible to tell in advance what neighborhood structure will 
give rise to a good search algorithm and we show how our methodology 
could have been used to discover some recent results on the structure 
of the SAT space of solutions. We use as a tool “Go with the winners”, 
an optimization heuristic that uses many particles that independently 
search the space of all possible solutions. By gathering statistics, we 
compare the combinatorial characteristics of the different neighborhoods 
and we show that there are certain features that make a neighborhood 
better than another, thus giving rise to good search algorithms. 



1 Introduction 

Satisfiability (SAT) is the problem of determining, given a Boolean formula / in 
conjunctive normal form, whether there exists a truth assignment to the vari- 
ables that makes the formula true. If all clauses consist of exactly k literals 
then the formula is said to be an instance of fc-SAT. While 2-SAT is solvable 
in polynomial time, fc-SAT, fc > 3 is known to be A^P-complete, so we cannot 
expect to have good performance in the worst case. In practice, however, one 
is willing to trade “completeness” for “soundness”. Incomplete algorithms may 
fail to find a satisfying assignment even if one exists but they usually perform 
very well in practice. A well studied algorithm of this sort is the WalkSat heuris- 
tic [SKC93]. The distinguishing characteristic of this algorithm is the way new 
assignments are examined. The algorithm chooses to flip only variables that ap- 
pear in unsatisfied clauses as opposed to dipping any variable and testing the 
new assignment. Moreover, flips are made even if occasionally they increase the 
number of unsatisfied clauses. 
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In this work we make an attempt to differentiate “good” neighborhoods from 
“bad” ones by discovering the combinatorial characteristics a neighborhood must 
have in order to produce a good algorithm. We study the WALKSAT neighbor- 
hood mentioned above and a simpler one that we call GREEDY in which neigh- 
boring assignments differ by flipping any variable. Our work was motivated by 
a challenge reported by D. S. Johnson [JOO] which we quote below: 

Understanding Metaheuristics 

“. . . Currently the only way to tell whether an algorithm of this 
sort [optimization heuristic] will be effective for a given problem 
is to implement and run it. We currently do not have an adequate 
theoretical understanding of the design of search neighborhoods, 
rules for selecting starting solutions, and the effectiveness of vari- 
ous search strategies. Initial work has given insight in narrow spe- 
cial cases, but no useful general theory has yet been developed. 

We need one.” 

Here, we compare the two neighborhoods by examining how the space of 
neighboring assignments, the search graph as we call it, decomposes into smaller 
regions of related solutions by imposing a quality threshold to them. Our main 
tool is the “Go with the winners” (GWW) strategy which uses many particles 
that independently search the search graph for a solution of large value. Dim- 
itriou and Impagliazzo [DI98] were able to relate the performance of GWW with 
the existence of a combinatorial property of the search graph, the so called “local 
expansion” . Intuitively, if local expansion holds then particles remain uniformly 
distributed and sampling can be used to deduce properties of the search space. 

Although the process of collecting statistics using GWW may not be a cheap 
one since many particles are used to search the space for good solutions, it is 
a more accurate one than collecting statistics by actually running the heuristic 
under investigation. The reason is that heuristic results may be biased towards 
certain regions of the space and any conclusions we draw may not be true. This 
cannot happen with GWW since when the local expansion property is true, 
particles remain uniformly distributed inside the space of solutions and one may 
use this information to unravel the combinatorial characteristics of the search 
space. These properties can then help optimize heuristic performance and design 
heuristics that take advantage of this information. 

Our goal in this work is to provide algorithm designers with a method of test- 
ing whether a proposed neighborhood structure will give rise to a good search 
algorithm. We do this following a two-step approach. Initially, we verify ex- 
perimentally that the search graph has good local expansion. Then by gathering 
statistics, we compare the combinatorial characteristics of the neighborhood and 
we show that there are certain features that make a neighborhood better than 
another, thus giving rise to good search algorithms. Our results are in some sense 
complementary to the work of Schuurmans and Southey [SSOl] for SAT where 
emphasis is given on the characteristics of successful search algorithms. Here, 
instead, the emphasis is on the properties of the search space itself. 
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2 Go with the Winners Strategy (GWW) 



Most optimization heuristics (Simulated Annealing, Genetic algorithms, Greedy, 
WalkSat, etc.) can be viewed as strategies, possibly probabilistic, of moving a 
particle in a search graph, where the goal is to find a solution of optimal value. 
The placement of the particle in a node of the search graph corresponds to 
examining a potential solution. 

An immediate generalization of this idea is to use many particles to explore 
the space of solutions. This generalization together with the extra feature of 
interaction between the particles is essentially “Go with the Winners” [DI96]. 
The algorithm uses many particles that independently search the space of all 
solutions. The particles however interact with each other in the following way: 
each “dead” particle (a particle that ended its search at a local optimum) is 
moved to the position of a randomly chosen particle that is still “alive” . The goal 
of course is to find a solution of optimal value. Progress is made by imposing 
a quality threshold to the solutions found, thus improving at each stage the 
average number of satisfied clauses. 

In Figure 1, a “generic” version of GWW is presented without any reference 
to the underlying neighborhood of solutions (see Section 3). Three conventions 
are made throughout this work: i) creating a particle means placing the particle 
in a (random or predefined) node of the search graph, ii) the value of a particle 
is the value of the solution it represents and iii) more than one particle may be 
in the same node of the search graph, thus examining the same solution. 

The algorithm uses two parameters: the number of particles B and the length 
S of each particle’s random walk. The intuition is that in the i-th stage and 
beyond we are eliminating from consideration assignments which satisfy less than 



Generic GWW 

Let Mi be the subset of all solutions during stage i of the algorithm, that is assign- 
ments satisfying at least i clauses. 

— Stage 0 {Initialization): Generate B random solutions and place one particle 
in each one of them. 

— Stage i {Main loop): Proceed in two phases. 

1. In the randomization phase, for each particle, perform an S steps random 
walk in Mi, where a step is defined as follows: Select a neighbor of the 
current solution. If its value is > i, move the particle to this neighbor, 
otherwise leave the particle in its current position. 

2. In the redistribution phase, if all particles have value i, stop. Otherwise, 
replace each particle of value i with a copy of a randomly chosen particle 
whose value is > i. 

3. Go to the next stage, by raising the threshold i. 

— Output the best solution found. 



Fig. 1. Description of the algorithm 
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i clauses. This divides the search space into components of assignments. The 
redistribution phase will make sure that particles in locally optimal solutions 
will be distributed among non-local solutions, while the randomization phase 
will ensure that particles remain uniformly distributed inside these emerging 
components. This is shown abstractly in Figure 2 for a part of the search space 
and two thresholds Ta, Tb- When the algorithm is in stage Ta, all solutions of 
value larger than Ta form a connected component and particles are uniformly 
distributed inside this large component. However, when the threshold is increased 
to Tb, the search graph decomposes into smaller components and the goal is to 
have particles in all of them in proportion to their sizes. 

Dimitriou and Impagliazzo [DI98] characterized the search spaces where 
GWW works well in terms of a combinatorial parameter, the local expansion 
of the search space. Intuitively this property suggests that if a particle starts a 
random walk, then the resulting solution will be uncorrelated to its start and it 
is unlikely that particles will be trapped into small regions of the search space. 

Studying the behavior of the algorithm with respect to its two important 
parameters will offer a tradeoff between using larger populations or longer walks. 
This in turn may give rise to implementations of the algorithm which produce 
high quality solutions, even when some of these parameters are kept small. In 
particular, our goal will be to show that for certain neighborhoods only a few 
particles suffice to search the space of solutions provided the space has good 
expansion properties and the length of the random walk is sufficiently long. 
When this is true, having a few particles is statistically the same as having one, 
therefore this particle together with the underlying neighborhood can be thought 
of as defining an optimization heuristic. 





Fig. 2. Decomposition of the search graph. 



3 Distribution of SAT Instances and Search Graphs 

It is known that random 3-SAT formulas exhibit a threshold behavior. What this 
means is that there is a constant r so that formulas with more than rn clauses 
are likely to be unsatisfiable, while formulas with less than rn clauses are likely 
to be satisfiable. In all experiments we generated formulas with ratio of clauses 
to variables r = 4.26, since as was found experimentally [MSL92,GW94] the 
hardest to solve instances occur at this ratio. 
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We proceed now to define the neighborhood structure of solutions. GWW 
is an optimization heuristic that tries to locate optimal solutions in a search 
graph whose nodes represent all feasible solutions for the given problem. Two 
nodes in the search graph are connected by an edge, if one solution results 
from the other by making a local change. The set of neighbors of a given node 
defines the neighborhood of solutions. Such a search graph is implicitly defined 
by the problem at hand and doesn’t have to be computed explicitly. The only 
operations required by the algorithm are i) generating a random node (solution) 
in the search graph, ii) compute the value of a given node, and iii) list efficiently 
all the neighboring solutions. The two search graphs we consider are: 
GREEDY 

Nodes correspond to assignments. Two such nodes are connected by an edge if 
one results from the other by flipping the truth value of a single variable. Thus 
any node in this search graph has exactly n neighbors, where n is the number 
of variables in the formula. The value of a node is simply the number of clauses 
satisfied by the corresponding assignment. 

WALKSAT 

Same as above except that two such nodes are considered neighboring if one 
results from the other by flipping the truth value of a variable that belongs to 
an unsatisfied clause. 

Our motivation of studying these search graphs is to explain why the WALK- 
SAT heuristic performs so well in practice as opposed to the simple GREEDY 
heuristic (which essentially corresponds to GSAT). Does this second search graph 
has some interesting combinatorial properties that explain the apparent success 
of this heuristic? We will try to answer some of these questions in the sections 
that follow. 

4 Implementation Details 

We used a highly optimized version of GWW in which we avoided the explicit 
enumeration of neighboring solutions in order to perform the random walks for 
each particle. 

In particular we followed a “randomized approach” to picking the right neigh- 
bor. To perform one step of the random walk, instead of just enumerating all 
possible neighbors we simply pick one of the n potential neighbors at random 
and check its value. If it is larger than the current threshold we place the particle 
there, otherwise we repeat the experiment. How many times do we have to do 
this in order to perform one step of the random walk? It can be shown (see for 
example Figure 4 illustrating the number of neighbors at each threshold) that 
in the earlier stages of the algorithm we only need a few tries. At later stages, 
when neighbors become scarce, we simply have to remember which flips we have 
tried and in any case never perform more than n flips. 

The savings are great because as can be seen in the same figure it is only in 
the last stages that the number of neighbors drops below n, thus requiring more 
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than one try on average. In all other stages each random step takes constant 
time. 

5 Experiments 

The two important parameters of GWW are population size B and random walk 
length S. Large, interacting populations allow the algorithm to reach deeper 
levels of the search space while long random walks allow GWW to escape from 
local optima that are encountered when one is using only greedy moves. The use 
of these two ingredients has the effect of maintaining a uniform distribution of 
particles throughout the part of the search graph being explored. 

The goal of the experiments is to understand the relative importance of these 
parameters and the structure of the search graph. In the first type of experiments 
(Section 5.1) we will try to reveal the expansion characteristics of the search 
graph and validate the implementation choices of the algorithm. If the search 
graph has good expansion, this can be shown by a series of tests that indicate 
that sufficiently long random walks uniformly distribute the particles. 

The purpose of the second type of experiments (Sections 5.2 and 5.3) is to 
study the quality of the solutions found as a function of B and S. In particular, 
we would like to know what is more beneficial to the algorithm: to invest in 
more particles or longer random walks ? Any variation in the quality of solutions 
returned by the algorithm as a function of these two parameters will illustrate 
different characteristics of the two neighborhoods. Then, hopefully, all these 
observations can be used to tell what makes a neighborhood better than another 
and how to design heuristics that take advantage of the structure of the search 
space. 

In the following experiments we tested formulas with 200 variables and 857 
clauses and each sample point on the figures was based on averaging over 500 
such random formulas. To support our findings we repeated the experiments with 
formulas with many more variables and again the same patterns of behavior were 
observed. 

5.1 Optimal Random Walk Length 

Before we proceed with the core of the experiments we need to know whether 
our “randomized” version of GWW returns valid statistics. This can be tested 
with a series of experiments that compare actual data collected by the algorithm 
with data expected to be true for random formulas. For these comparisons to 
be valid we need to be sure that particles remain well distributed in the space 
of solutions. If particles are biased towards certain regions of the space this will 
be reflected on the statistics and any conclusions we draw may not be true. So, 
how can we measure the right walk length so that particles remain uniformly 
distributed? 

One nice such test is to compute for each particle the Hamming distance 
of the assignments before and after the random walk, and compare with the 
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Fig. 3. Choosing the right walk length that achieves uniformity for GREEDY (left) 
and WALKS AT (right) neighborhoods: Average Hamming distance between start and 
end of the random walk for S = 100, 200, 300, 500, 1000. 



distance expected in the random formula model. If the length S of the walk is 
sufficiently large, these quantities should match. 

In Figure 3 we show the results of this experiment for a medium popula- 
tion size of i? = 100 particles and various walk lengths (S = 100,200,300,500, 
1000). For each stage (shown are stages where the number of satisfied formulas 
is > 600) we computed the average Hamming distance between solutions (as- 
signments) at start and end of the random walk, and we plotted the average 
over all particles. As we can see the required length to achieve uniformity in 
the GREEDY neighborhood is about 1000 steps. For this number of steps the 
average Hamming distance matches the one expected when working with ran- 
dom formulas, which is exactly n/2. It is only in the last stages of the algorithm 
that the Hamming distance begins to drop below the n/2 value as not all flips 
give rise to good neighbors (compare with Figure 4). The same is true for the 
WALKSAT neighborhood. The required length is again about 1000 steps. Here 
however, the Hamming distance is slightly smaller as flips are confined only to 
variables found in unsatisfied clauses. Thus we conclude that 1000 steps seems 
to be a sufficient walk length so that particles remain uniformly distributed in 
the space of solutions. 

A second test we performed to compute the appropriate walk length was to 
calculate the average number of neighbors of each particle at each threshold and 
compare it with the expected values in the random formulas model. The results 
are shown in Figure 4(left) for a population size B = 100 and the optimal ran- 
dom walk length (S = 1000). For the GREEDY neighborhood, at least in the 
early stages of the algorithm, all the flips should lead to valid neighbors, so their 
number should be equal to n, the number of variables. For the WALKSAT neigh- 
borhood the number of neighbors should be smaller than n as flips are confined 
to variables in unsatisfied clauses. As it can be seen, the collected averages match 
with the values expected to be true for random formulas, proving the validity of 
the randomized implementation and the uniform distribution of particles. 

Finally in Figure 4(right), we performed yet another test to examine the hy- 
pothesis that 1000 steps are sufficient to uniformly distribute the particles. In 
this test we examined the average number of satisfied clauses of the GWW pop- 
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Fig. 4. Comparing the number of neighbors (left) and unsatished clauses [right) at 
various thresholds with those expected for random formulas, for both neighborhoods. 
The collected data matched the expectations. 



ulation at each threshold and compared them with those expected for random 
formulas. Again the two curves matched showing that particles remain well dis- 
tributed. In this particular plot it is instructive to observe a qualitative difference 
between the two neighborhoods. In the GREEDY one, the algorithm starts with 
1/8 of the clauses unsatisfied and it is only in the last stages that this number 
begins to drop. In the WALKSAT neighborhood however, this number is much 
smaller. It seems that the algorithm already has an advantage over the GREEDY 
implementation as it has to satisfy fewer unsatisfied clauses (approximately 5/64 
of the clauses against 8/64 of the GREEDY neighborhood). Does this also mean 
that the algorithm will need fewer “resources” to explore the space of solutions? 
We will try to answer this next. 

5.2 Characterizing the Two Neighborhoods 

In this part we would like to study the quality of the solutions found by GWW 
as a function of its two mechanisms, population size and random walk length. 
Studying the question of whether to invest in more particles or longer random 
walks will illustrate different characteristics of the two neighborhoods. In partic- 
ular, correlation of the quality of the solutions with population size will provide 
information about the connectedness of the search graphs, while correlation with 
random walk length will tell us more about the expansion characteristics of these 
search spaces. 

We first studied the effect of varying the population size B while keeping the 
number of random steps S constant, for various walk lengths [S = 4,8, 16,32). 
The number of satisfied clauses increased as a function of B and the curves 
looked asymptotic, like those of Figure 5. In general, the plots corresponding to 
the two neighborhoods were similar with the exception that results were slightly 
better for the WALKSAT neighborhood. 

In Figure 5 we studied the effect of varying the number of steps while keeping 
the population size constant, for various values oi B [B = 2,4,8,16,32). As it 
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Fig. 5. Average number of clauses satisfied vs random walk length for various pop- 
ulation sizes. Results shown are for GREEDY (left) and WALKS AT neighborhoods 
[right). 



can be seen, increasing the population size resulted in better solutions found. 
But there is a distinctive difference between the two neighborhoods. While in 
the first increasing the particles had a clear impact on the quality of solutions 
found, in the second, having 8 particles was essentially the same as having 32 
particles. Furthermore, 2 particles searching the WALKSAT neighborhood ob- 
tained far better results than 4 particles searching the GREEDY one, and in 
general the GREEDY neighborhood required twice as many particles to achieve 
results comparable to those obtained in the WALKSAT one. Thus we note im- 
mediately that the second neighborhood is more suited for search with fewer 
particles, reaching the same quality on the solutions found provided the walks 
are kept sufficiently long. This agrees with results found by Hoos [Ho99] where 
a convincing explanation is given for this phenomenon: Greedy (i.e. GSAT) is 
essentially incomplete, so numerous restarts are necessary to remedy this situa- 
tion. Since a restart is like introducing a new search particle, although without 
the benefit of interaction between particles as in GWW, we see that our findings 
come to verify this result. 

It is evident from these experiments that increasing the computational re- 
sources (particles or walk steps) improves the quality of the solutions. We also 
have an indication that the number of particles is not so important in the WALK- 
SAT neighborhood. So, to make this difference even more striking, we proceeded 
to examine the effect on the solutions found when the product B x S, and hence 
the running time, was kept constant^. The results are shown in Figure 6. 

It is clear that as the number of particles increases the average value also 
increases, but there is a point where this behavior stops and the value of solutions 
starts to decline. This is well understood. As the population becomes large, the 
number of random steps decreases (under the constant time constraint) and this 
has the effect of not allowing the particles to escape from bad regions. 



^ The running time of GWW is O(BSm), where m is the maximum number of stages 
(formula clauses) 
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Fig. 6. Average solution value vs number of particles for different values of B x S'. 
Results shown are for GREEDY (left) and WALKSAT neighborhoods {right). 



Perhaps what is more instructive to observe is the point in the two neighbor- 
hoods when this happens. In the GREEDY neighborhood this behavior is more 
balanced between particles and length of random walk, as one resource does not 
seem more important than the other. In the WALKSAT neighborhood however, 
things change drastically as the quality of the solutions found degrades when 
fewer steps and more particles are used! Thus having longer random walks is 
more beneficial to the algorithm than having larger populations. This comes to 
validate the observation we made in the beginning of this section that the sec- 
ond neighborhood is more suited for search with fewer particles. When taken to 
the extreme this explains the apparent success of WALKSAT type of algorithms 
in searching for satisfying assignments as they can be viewed as strategies for 
moving just one particle around the WALKSAT neighborhood. 

To illustrate the difference in the two neighborhoods we performed 300 runs 
of GWW using B = 2 and S = 1000 and we presented the results in Figure 7 as a 
histogram of solutions. The histograms of the two neighborhoods were displayed 
on the same axis as the overlap of values was very small. There was such strong 
separation between the two neighborhoods that even when we normalized for 
running time the overlap was still very small. Indeed the best assignment found 
in the GREEDY neighborhood satisfied only 827 clauses, which is essentially the 
worst case for the WALKSAT neighborhood. One can thus conclude that the 
second implementation is intrinsically more powerful and more suited for local 
optimization than the first one, even when running time is taken into account. 

5.3 Further Properties of the Search Graphs 

The results of the previous section suggest that the WALKSAT neighborhood 
is more suited for local search as only a few particles can locate the optimal 
solutions. So, we may ask what is the reason behind this discrepancy between 
the two neighborhoods? 

We believe the answer must lie in the structure of the search graphs. As the 
GREEDY space decomposes into components by imposing the quality threshold 
to solutions, good solutions must reside in pits with high barriers that render 
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Fig. 7. Histogram of satished clauses for 300 runs of GWW with B = 2 and S = 1000. 
Results shown are for GREEDY and WALKSAT neighborhoods. 




Threshold 



Fig. 8. Results of performing simple greedy optimization starting from particles at 
various thresholds. The plots show the average number of downhill moves required 
before getting trapped into a local optimum in both neighborhoods. 



long random walks useless. So the algorithm requires more particles as smaller 
populations usually get trapped. In the case of the WALKSAT neighborhood, 
the search graph must be “smooth” in the sense that good solutions must not 
be hidden in such deep pits. Thus the algorithm needs only a few particles to 
hit these regions. 

We tried to test this hypothesis with the following experiment: Once the 
particles were uniformly distributed after the randomization phase, we performed 
simple greedy optimization starting from that particle’s position and recorded 
the number of downhill moves (we use this term even if this is a maximization 
problem) required before the particle gets trapped into a local optimum. Then 
we averaged over all particles and proceeded with the next stage. So, essentially 
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we counted the average (downhill) distance a particle has to travel before gets 
trapped. Since by assumption the particles are uniformly distributed, this reveals 
the depth of the pits. 

As it can be seen in Figure 8, particles in the WALKSAT implementation 
had to overcome smaller barriers than those in the GREEDY one. (One may 
object by saying that the average improvement per move may be larger in the 
WALKSAT neighborhood. But this is not the case as we found that improvement 
is « 2 clauses/move in the GREEDY space against « 1.6 clauses/move in the 
WALKSAT one). Moreover, the same patterns were observed when we repeated 
the experiment with formulas having 500 and 1000 variables. This is a remarkable 
result. It simply says that the WALKSAT neighborhood is in general smoother 
than the GREEDY one and easier to be searched with only a few particles, which 
perhaps also explains why GSAT doesn’t meet the performance of WalkSat. 
Again our findings come to verify some old results: Schuurmans and Southey 
[SSOl] identify three measures of local search effectiveness, one of which is depth 
corresponding to number of unsatisfied clauses. Similarly, the depth of GSAT is 
intensively studied by Gent and Walsh [GW93]. 

Finally we performed another experiment similar to the previous one (not 
shown here due to space restrictions) in which we counted the average number of 
neighbors of each greedily obtained solution. A number smaller than the number 
of neighbors found by GWW (Figure 4) would be an indication that solutions 
lied inside deep pits. The results again supported the “smoothness” conjecture; 
the number of neighbors of greedily obtained solutions in the WALKSAT space 
was much closer to the expected curve than those of the GREEDY one. 

6 Conclusions and Future Research 

Many optimizations algorithms can be viewed as strategies for searching a space 
of potential solutions in order to find a solution of optimal value. The success of 
these algorithms depends on the way the underlying search graph is implicitly 
defined and in particular on the way a new potential solution is generated by 
making local changes to the current one. As was mentioned in [JOO], currently 
the only way to tell whether an algorithm of this sort will be effective for a 
given problem is simply to implement and run it. So, one of the challenges 
algorithm designers face is to design the right search neighborhood so that the 
corresponding optimization heuristic behaves as expected. 

In this work we considered the problem of differentiating good neighborhoods 
from bad ones. In particular, we studied two search neighborhoods for the 3-SAT 
problem, a simple one which we called GREEDY and a more refined one that we 
called WALKSAT. In the first one, neighboring assignments were generated by 
flipping the value of any of the n variables, while in the second one only variables 
that belong to unsatisfied formulas were flipped. Our motivation for this work 
was inspired by the challenge mentioned above and the need to explain the 
apparent success of WALKSAT type of algorithms in finding good satisfying 
assignments since all of them are based on the WALKSAT neighborhood. 
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We gave evidence that it is possible to tell in advance what neighborhood 
structure will give rise to a good search algorithm by comparing the combi- 
natorial characteristics of the two neighborhoods. We used as a platform for 
testing neighborhoods “Go with the winners” , an algorithm that uses many par- 
ticles that independently search the space of solutions. By gathering statistics 
we showed that there are certain features that make one neighborhood better 
than another, thus giving rise to a good search algorithm. 

In particular, we noticed that the WALKS AT neighborhood was more suited 
for search with fewer particles, statistically the same as one. We expected this to 
be true since we knew that the WalkSat heuristic performs extremely well, but 
we were surprised by the extend to which this was true. We thought that hav- 
ing a more balanced setting between particles and walk lengths would be more 
beneficial to GWW but this was the case only for the GREEDY neighborhood. 

Although we studied only one type of problem (SAT), we believe that search 
spaces for which good heuristics exist must have similar “characteristics” as the 
WALKSAT space and can be verified using our approach. Specifically, to test if 
a neighborhood will give rise to a good search algorithm run GWW and study 
the tradeoff between particles and random walk steps. If GWW can discover good 
solutions with just a few particles and long enough walks, then this space is a 
candidate for a good search heuristic. These observations lead to some interesting 
research directions: 



~ Provide more evidence for the previous conjecture by trying to analyze the 
search spaces of other optimization problems for which good algorithms exist. 
Do the corresponding search graphs have similar properties to the WALK- 
SAT one? This line of research would further validate GWW as a tool for 
collecting valid statistics. 

~ What are the properties of search spaces for which no good algorithms exist? 
Are in any sense complementary to those defined here? 

— Understand how the WALKSAT space decomposes into components by im- 
posing the quality threshold to solutions. We believe that the WALKSAT 
space decomposes into only a few components (which also explains why one 
doesn’t need many particles) but this remains to be seen. 

— Gan we find a simple neighborhood that behaves even better than WALK- 
SAT? If this neighborhood has similar characteristics to WALKSAT’s (suf- 
ficiency of a few particles to locate good solutions, small barriers, etc.) it 
will probably give rise to even better satisfiability algorithms. (Introducing 
weights [F97,WW97] smooths out the space but does not meet our definition 
of a neighborhood.) 

— More ambitiously, try to analyze WalkSat and its variants and prove that 
they work in polynomial time (for certain ratios of clauses to variables, of 
course). This ambitious plan is supported by the fact that similar findings 
for graph bisection [DI98,Ga01]) ultimately led to a polynomial time, local 
search algorithm for finding good bisections [GIOl]. 
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Abstract. In the single-source unsplittable flow problem, commodities 
must be routed simultaneously from a common source vertex to certain 
sinks in a given graph with edge capacities. The demand of each com- 
modity must be routed along a single path so that the total flow through 
any edge is at most its capacity. This problem was introduced by Klein- 
berg [12] and generalizes several NP-complete problems. A cost value per 
unit of flow may also be defined for every edge. In this paper, we imple- 
ment the 2-approximation algorithm of Dinitz, Garg, and Goemans [6] 
for congestion, which is the best known, and the (3, l)-approximation al- 
gorithm of Skutella [19] for congestion and cost, which is the best known 
bicriteria approximation. We study experimentally the quality of approx- 
imation achieved by the algorithms and the effect of heuristics on their 
performance. We also compare these algorithms against the previous best 
ones by Kolliopoulos and Stein [15]. 



1 Introduction 

In the single-source unsplittable flow problem (Ufp), we are given a directed 
graph G = (V,E) with edge capacities u : E ^ M+, a designated source ver- 
tex s G V, and k commodities each with a terminal (sink) vertex fl G V and 
associated demand di G K+, I < i < k. For each i, we have to route di units 
of commodity i along a single path from s to tt so that the total flow through 
an edge e is at most its capacity u(e). As is standard in the relevant literature 
we assume that no edge can be a bottleneck, i.e., the minimum edge capacity is 
assumed to have value at least maxj di. We will refer to instances which satisfy 
this assumption as balanced, and ones which violate it as unbalanced. Instances 
in which the maximum demand is p times the minimum capacity, for p > 1, are 
p-unbalanced. A relaxation of Ufp is obtained by allowing the demands of com- 
modities to be split along more than one path; this yields a standard maximum 

* Part of this work was done while at the Department of Computing and Software, 
McMaster University. Partially supported by NSERC Grant 227809-00. 

** Partially supported by NSERC Grant 227809-00. 
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flow problem. We will call a solution to this relaxation, a fractional or splittahle 
flow. 

In this paper we use the following terminology. A flow / is called feasible if it 
satisfles all the demands and respects the capacity constraints, i.e., /(e) < u{e) 
for all e G E. An unsplittable flow f can be specified as a flow function on the 
edges or equivalently by a set of paths {Pi, • • • Pfc}, where Pi starts at the source 
s and ends at U, such that /(e) = Yli-e^Pi edges e G E. Hence in 

the Ufp a feasible solution means a feasible unsplittable flow. If a cost function 
c: E ^ K+ on the edges is given, then the cost c(/) of flow / is given by c(/) = 
The cost c{Pi) of an path Pi is deflned as c{Pi) = J2eeP 
so that the cost of an unsplittable flow / given by paths Pi , • • • , Pfe can also be 
written as c(/) = X)?=i ’ c(Pi). In the version of the Ufp with eosts, apart 

from the cost function c: E ^ M+ we are also given a budget B >0. We seek a 
feasible unsplittable flow whose total cost does not exceed the budget. Finally, 
we set dmax = maxi<i<fcdi, dmin = mini<i<fcdi, and Wmin = minegsUe. For 
a,b G we write a \ b and say that b is a-integral if and only if 5 G a • N. 

The feasibility question for Ufp (without costs) is strongly A^P-complete 
[12]. Various optimization versions can be deflned for the problem. In this study 
we focus on minimizing congestion: Find the smallest a > 1 such that there 
exists a feasible unsplittable flow if all capacities are multiplied by a. Among the 
different optimization versions of U fp the congestion metric admits the currently 
best approximation ratios. Moreover congestion has been studied extensively in 
several settings for its connections to multicommodity flow and cuts. 

Previous work. Ufp was introduced by Kleinberg [12] and contains several well- 
known NP-complete problems as special cases: Partition, Bin Packing, schedul- 
ing on parallel machines to minimize makespan [12]. In addition Ufp generalizes 
single-source edge-disjoint paths and models aspects of virtual circuit routing. 
The first constant-factor approximations were given in [13]. Kolliopoulos and 
Stein [14,15] gave a 3-approximation algorithm for congestion with a simultane- 
ous performance guarantee 2 for cost, which we denote as a (3, 2)-approximation. 
Dinitz, Garg, and Goemans [6] improved the congestion bound to 2. To be more 
precise, their basic result is: any splittable flow satisfying all demands can be 
turned into an unsplittable flow while increasing the total flow through any edge 
by less than the maximum demand and this is tight [6] . It is known that when the 
congestion of the fractional flow is used as a lower bound the factor 2 increase 
in congestion is unavoidable. Skutella [19] improved the (3, 2)-approximation 
algorithm for congestion and cost [14] to a (3, l)-approximation algorithm. 

In terms of negative results, Lenstra, Shmoys, and Tardos [16] show that 
the minimum congestion problem cannot be approximated within less than 3/2, 
unless P = NP. Skutella [19] shows that, unless P = NP, congestion cannot be 
approximated within less than (1 -|- -\/5)/2 « 1.618 for the case of ((1 -I- -\/5)/2)- 
unbalanced instances. Erlebach and Hall [7] prove that for arbitrary £ > 0 there 
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is no (2 — £, 1) -approximation algorithm for congestion and cost unless P = NP. 
Matching this bicriteria lower bound is a major open question. 

This work. As a continuation of the experimental study initiated by Kolliopoulos 
and Stein [15], we present an evaluation of the current state-of-the-art algorithms 
from the literature. We implement the two currently best approximation algo- 
rithms for minimizing congestion: (i) the 2-approximation algorithm of Dinitz, 
Garg, and Goemans [6] (denoted DGGA) and (ii) the (3, l)-approximation algo- 
rithm of Skutella [19] (denoted SA) which simultaneously mininimizes the cost. 
We study experimentally the quality of approximation achieved by the algo- 
rithms, and the effect of heuristics on approximation and running time. We also 
compare these algorithms against two implementations of the Kolliopoulos and 
Stein [14,15] 3-approximation algorithm (denoted KSA). Extensive experiments 
on the latter algorithm and its variants were reported in [15]. 

The goal of our work is to examine primarily the quality of approximation. We 
also consider the time efficiency of the approximation algorithms we implement. 
Since our main focus is on the performance guarantee we have not extensively 
optimized our codes for speed and we use a granularity of seconds to indicate the 
running time. Our input data comes from four different generators introduced in 
[15]. The performance guarantee is compared against the congestion achieved by 
the fractional solution, which is always taken to be 1. This comparison between 
the unsplittable and the fractional solution mirrors the analyses of the algo- 
rithms we consider. Moreover it has the benefit of providing information on the 
“integrality” gap between the two solutions. In general terms, our experimental 
study shows that the approximation quality of the DGGA is typically better, by 
a small absolute amount, than that of the KSA. Both algorithms behave consis- 
tently better than the SA. However the latter remains competitive for minimum 
congestion even though it is constrained by having to meet the budget require- 
ment. All three algorithms achieve approximation ratios which are typically well 
below the theoretical ones. After reviewing the algorithms and the experimental 
setting we present the results in detail in Section 5. 

2 The 2- Approximation Algorithm for Minimum 
Congestion 

In this section we briefly present the DGGA [6] and give a quick overview of the 
analysis as given in [6]. The skeleton of the algorithm is given in Fig. 1. 

We explain the steps of the main loop. Gertain edges, labeled as singular, 
play a special role. These are the edges (u,v) such that v and all the vertices 
reachable from v have out-degree at most 1. To construct an alternating cycle 
C we begin from an arbitrary vertex v. From v we follow outgoing edges as long 
as possible, thereby constructing a forward path. Since the graph is acyclic, this 
procedure stops, and we can only stop at a terminal, U. We then construct a 
backward path by beginning from any edge entering ti distinct from the edge that 
was used to reach ti and following singular incoming edges as far as possible. We 
thus stop at the first vertex, say v' , which has another edge leaving it. We now 
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continue by constructing a forward path from v' . We proceed in this manner till 
we reach a vertex, say w, that was already visited. This creates a cycle. If the 
two paths containing w in the cycle are of the same type then they both have 
to be forward paths, and we glue them into one forward path. Thus the cycle 
consists of alternating forward and backward paths. 



DGG- Algorithm: 

Input: A directed graph G = (V, E) with a source vertex s G V, k commodi- 
ties i = 1, • • • , fc with terminals ti G D \ {«} and positive demands di, and 
a (splittable) flow on G satisfying all demands. 

Output: An unsplittable flow given by a path Pi from s to each terminal ti, 
l<i<k. 

remove all edges with zero flow and all flow cycles from G; 
preliminary phase: 
i := 1; 

while i < k do 

while there is an incoming edge e = (v,ti) with flow > di do 
move ti to n; 
add e to Pi; 

decrease the flow on e by di; 
remove e from G if the flow on e vanishes; 
i:=i + l; 
main loop: 

while outdegree(s) > 0 do 

construct an alternating cycle G; 
augment flow along G; 

move terminals as in the preliminary phase giving preference to singular 
edges; 

return Pi,-- - ,Pk\ 



Fig. 1. Algorithm DGGA. 



We augment the flow along C by decreasing the flow along the forward paths 
and increasing the flow along the backward paths by the same amount equal 
to min{ei,£ 2 }. The quantity £i > 0, is the minimum flow along an edge on a 
forward path of the cycle. The second quantity, £ 2 , is equal to min(dj — /(e)) 
where the minimum is taken over all edges e = (u, v) lying on backward paths 
of the cycle and over all terminals tj at v for which dj > /(e). If the minimum is 
achieved for an edge on a forward path then after the augmentation the flow on 
this edge vanishes and so the edge disappears from the graph. If the minimum 
is achieved for an edge (u,tj) on a backward path, then after the augmentation 
the flow on (u,tj) is equal to dj. 

Analysis Overview. The correctness of the algorithm is based on the following 
two facts: the first is that at the beginning of any iteration, the in-degree of 
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any vertex containing one or more terminals is at least 2; the second, which is 
a consequence of the first fact, is that as long as all terminals have not reached 
the source, the algorithm always finds an alternating cycle. 

At each iteration, after augmentation either the flow on some forward edge 
vanishes and so the edge disappears from the graph or the flow on a backward 
edge (m, tj) is equal to dj and so the edge disappears from the graph after moving 
the terminal tj to u, decreasing the flow on the edge (u, tj) to zero and removing 
this edge from the graph. So, as a result of each iteration, at least one edge is 
eliminated and the algorithm makes progress. Before an edge becomes a singular 
edge, the flow on it does not increase. After the edge becomes a singular edge 
we move at most one terminal along this edge and then this edge vanishes. Thus 
the total unsplittable flow through this edge is less than the sum of its initial 
flow and the maximum demand and the performance guarantee is at most 2. We 
refer the reader to [6] for more details. 

Running Time. Since every augmentation removes at least one edge, the number 
of augmentations is at most m, where m = \E\. An augmenting cycle can be 
found in 0{n) time, where n = \V\. The time for moving terminals is 0{kn), 
where k denotes the number of terminals. Since there are k terminals, comput- 
ing £2 requires 0{k) time in each iteration. Therefore the running time of the 
algorithm is 0{nm + km). 

Heuristic Improvement. We have a second implementation with an added heuris- 
tic. The purpose of the heuristic is to try to reduce the congestion. The heuristic 
is designed so that it does not affect the theoretical performance guarantee of the 
original algorithm, but as a sacrifice, the running time is increased. In our sec- 
ond implementation, we use the heuristic only when we determine an alternating 
cycle. We always pick an outgoing edge with the smallest flow to go forward and 
choose an incoming edge with the largest flow to go backward. For most of the 
cases we tested, as we show in the experimental results in Section 5, the conges- 
tion is reduced somewhat. For some cases, it is reduced a lot. The running time 
for the new implementation with the heuristic is 0{dnm + km), where d is the 
maximum value of incoming and outgoing degrees among all vertices, since the 
time for finding an alternating cycle is now 0{dn). 

3 The (3, 1)- Approximation Algorithm for Congestion 
and Cost 

The KSA [14,15] iteratively improves a solution by doubling the flow amount on 
each path utilized by a commodity which has not yet been routed unsplittably. 
This scheme can be implemented to give a (2, l)-approximation if all demands are 
powers of 2. Skutella’s algorithm [19] combines this idea with a clever initializa- 
tion which rounds down the demands to powers of 2 by removing the most costly 
paths from the solution. In Fig. 2 we give the algorithm for the case of powers of 
2 [14,15] in the more efficient implementation of Skutella [19]. The main idea be- 
hind the analysis of the congestion guarantee is that the total increase on an edge 
capacity across all iterations is bounded by Z;<iog(<imax/dmi„) < dmax- 
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Power- Algorithm: 

Input: A directed graph G = {V, E) with non-negative costs on the edges, a 
source vertex s G V, k commodities i = 1, - ■ ■ ,k with terminals ti G V^\{s} 
and positive demands di = dmin • 2'*% gi G N, gi < <72 < • ■ • < Qfc, and a 
(splittable) flow /o on G satisfying all demands. 

Output: An unsplittable flow given by a path Pi from s to each terminal ti, 
l<i<k. 
i:=l;j:=Q; 

while dmin • 2^ < dmax do 

j •” j “b I 5 dj^jn ' 2 ^ ; 

for every edge e (z E, set its capacity Mg to /,_i(e) rounded up to the 
nearest multiple of 5j\ 

compute a feasible dj-integral flow fj satisfying all demands with c{fj) < 

remove all edges e with fj{e) = 0 from G; 
while i < k and di = 5j do 

determine an arbitrary path Pi from s to ti in G; 
decrease fj along Pi by di; 
remove all edges e with fj{e) = 0 from G; 
i := i + l; 
return Pi,-- - ,Pk; 



Fig. 2. The SA after all demands have been rounded to powers of 2 



Running time. The running time of the Power-Algorithm is dominated by 
the time to compute a (5j-integral flow fj in each while-loop-iteration j. Given the 
flow fj-i, this can be done in the following way [19]. We consider the subgraph 
of the current graph G which is induced by all edges e whose flow value /j_i(e) 
is not (5j-integral. Starting at an arbitrary vertex of this subgraph and ignoring 
directions of edges, we greedily determine a cycle C; this is possible since, due 
to flow conservation, the degree of every vertex is at least two. Then, we choose 
the orientation of the augmentation on C so that the cost of the flow is not 
increased. We augment flow on the edges of C whose direction is identical to 
the augmentation orientation and decrease flow by the same amount on the 
other edges of C until the flow value on one of the edges becomes 5j-integral. We 
delete all Jj-integral edges and continue iteratively. This process terminates after 
at most m iterations and has thus running time 0{nm). The number of while- 
loop-iterations is 1 -b log(dinax/dmin)- The running time of the first iteration 
is 0{nm) as discussed above. However, since fj-i is (dmin ■ 2^“^)-integral in 
each further iteration j > 2, the amount of augmented flow along a cycle C is 
dmin • 2'’“^ and after the augmentation the flow on each edge of C is (dmin • 2-’“^)- 
integral and thus all edges of C will not be involved in the remaining cycle 
augmentation steps of this iteration. So the computation of fj from /j_i takes 
only 0{m) time. Moreover the path Pi can be determined in 0(n) time for 
each commodity i and the total running time of the Power- Algorithm is 
0{kn + TOlog(dmax/dmin) + nm) . 
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We now present the General- Algorithm [19] which works for arbitrary 
demand values. In the remainder of the paper when we refer to the SA we mean 
the General- Algorithm. It constructs an unsplittable flow by rounding down 
the demand values such that the rounded demands satisfy the condition for using 
the Power- Algorithm. Then, the latter algorithm is called to compute paths 
Pi, ■ ■ ■ , Pk- Finally, the original demand of commodity z, 1 < t < fc, is routed 
across path Pi. In contrast the KSA rounds demands up to the closest power of 
2 before invoking the analogue of the Power- Algorithm. 

We may assume that the graph is acyclic, which can be achieved by removing 
all edges with flow value 0 and iteratively reducing flow along directed cycles. 
This can be implemented in 0{nm) time using standard methods. 

In the first step of the General- Algorithm, we round down all demands 
di to 

d — d ■ ■o'dog(di/d^^^)\ 

ill . — Umm 

Then, in a second step, we modify the flow / such that it only satisfies the 
rounded demands d{, I < i < k. The algorithm deals with the commodities i 
one after another and iteratively reduces the flow / along the most expensive 
s-ti-paths within / (ignoring or removing edges with flow value zero) until the 
inflow in node ti has been decreased by di — di. So, when we reroute this amount 
of reduced flow along any s-ti-paths within the updated /, the cost of this part 
of the flow will not increase. Since the underlying graph has no directed cycles, 
a most expensive s-t^-path can be computed in polynomial time. Notice that the 
resulting flow / satisfies all rounded demands. Thus, the Power-Algorithm 
can be used to turn / into an unsplittable flow / for the rounded instance with 
c(/) < c(/). The General- Algorithm constructs an unsplittable flow / for 
the original instance by routing, for each commodity z, the total demand di 
(instead of only di) along the path Pi returned by the Power- Algorithm and 
the cost of / is bounded by c(/) = c{f) + J2^^^{d^ - di)c{Pi) < c{f) + YA=i{di - 
di)c(Pi) < c(/). 

Skutella [19] shows that the General- Algorithm finds an unsplittable flow 
whose cost is bounded by the cost of the initial flow / and the flow value on 
any edge e is less than 2/(e) -I- dmax- Therefore, if the instance is balanced, i.e., 
the assumption that dmax < Wmin is satisfied, an unsplittable flow whose cost is 
bounded by the cost of the initial flow and whose congestion is less than 3 can be 
obtained. Furthermore, if we use a minimum-cost flow algorithm to And a feasible 
splittable flow of minimum cost for the initial flow, the cost of an unsplittable 
flow obtained by the General- Algorithm is bounded by this minimum cost. 

Running time. The procedure for obtaining / from / can be implemented to run 
in 0{w?) time; in each iteration of the procedure, computing the most expensive 
paths from s to all vertices in the current acyclic network takes 0{m) time, and 
the number of iterations can be bounded by 0(m). Thus, the running time 
of the General-Algorithm is 0{m?) plus the running time of the Power- 
Algorithm, i.e., 0{m'^ + kn + mlog{dn,ax/dmin))- The first term can be usually 
improved using a suitable min-cost flow algorithm [19]. We examine this further 
in Section 5. 
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In our implementation, the variable 5j adopts only the distinct rounded de- 
mand values. We have two reasons for doing that. The first is that it is not 
necessary for 6j to adopt a value of the form dmin • 2* when it is not a rounded 
demand value and as a result of this we could have fewer iterations. The second 
reason is because of the following heuristic we intend to use. 

Heuristic improvement. We have a second implementation of the SA in which 
we try to select augmenting cycles in a more sophisticated manner. When we 
look for an augmenting cycle in iteration j, at the current vertex we always pick 
an outgoing or incoming edge on which the flow value is not dj-integral and the 
difference between 6j and the remainder of the flow value with respect to 6j is 
minimal. Unfortunately, the benefit of this heuristic seems to be very limited. We 
give details in Section 5. As mentioned above, in our implementation the variable 
Sj adopts only the different rounded demand values. Since the time for finding 
an augmenting cycle in the implementation with the heuristic is 0{dn), where d 
is the maximum value of in- and outdegrees among all vertices, the worst-case 
running time for the implementation with the heuristic is 0{m^ + dknm). 

4 Experimental Framework 

Software and hardware resources. We conducted our experiments on a sun4u 
spare SUNW Ultra-5_10 workstation with 640 MB of RAM and 979 MB swap 
space. The operating system was SunOS, release 5.8 Generic_108528-14. Our 
programs were written in C and compiled using gcc, version 2.95, with the -03 
optimization option. 

Codes tested. The fastest maximum flow algorithm to date is due to Goldberg and 
Rao [8] with a running time of 0(min{n^/^, m^/^}mlog (n^/m) log U), where U 
is an upper bound on the edge capacities which are assumed to be integral. 
However in practice preflow-push [10] algorithms are the fastest. We use the 
preflow-push Gherkassky-Goldberg code kit [5] to And a maximum flow as an 
initial fractional flow. We assume integral capacities and demands in the unsplit- 
table flow input. We implement and test the following codes: 

2alg: this is the DGGA without any heuristic. 

2alg_h: version of 2a Ig with the heuristic described in Section 2. 

3skut : this is the SA without any heuristic. 

3skut_h: version of 3skut with the heuristic described at the end of Section 3. 

In addition we compare against the programs 3al and 3al2 used in [15], where 
3al is an implementation of the KSA. The program 3al2 is an implementation of 
the same algorithm, where to improve the running time the edges carrying zero 
flow in the initial fractional solution are discarded. Note that both the DGGA 
and the SA discard these edges as well. 

Input classes. We generated data from the same four input classes designed by 
Kolliopoulos and Stein [15]. For each class we generated a variety of instances 
varying different parameters. The generators use randomness to produce different 
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instances for the same parameter values. To make our experiments repeatable 
the seed of the pseudorandom generator is an input parameter for all generators. 
If no seed is given, a fixed default value is chosen. We used the default seed in 
generating all inputs. The four classes used are defined next. Whenever the term 
“randomly” is used in the following, we mean uniformly at random. For the 
inputs to 3skut and 3skut Ji, we also generate randomly a cost value on each 
edge using the default seed. 

genrmf. This is adapted from the GENRMF generator of Goldfarb and Grigo- 
riadis [11,2]. The input parameters are a b cl c2 k d. The generated network 
has b frames (grids) of size a x a, for a total of a * a* b vertices. In each frame 
each vertex is connected with its neighbors in all four directions. In addition, 
the vertices of a frame are connected one-to-one with the vertices of the next 
frame via a random permutation of those vertices. The source is the lower left 
vertex of the first frame. Vertices become sinks with probability 1/fc and their 
demand is chosen uniformly at random from the interval [1, d]. The capacities 
are randomly chosen integers from (cl, c2) in the case of interframe edges, and 
(1, c2 * a* a) for the in- frame edges. 

noigen. This is adapted from the noigen generator used in [3,17] for minimum 
cut experimentation. The input parameters are n d t p k. The network has n 
nodes and [n{n — l)d/200j edges. Vertices are randomly distributed among t 
components. Gapacities are chosen uniformly from a prespecified range [/, 21] in 
the case of intercomponent edges and from [pi, 2pl] for intracomponent edges, p 
being a positive integer. Only vertices belonging to one of the t — 1 components 
not containing the source can become sinks, each with probability 1/k. The 
desired effect of the construction is for commodities to contend for the light 
intercomponent cuts. Demand for commodities is chosen uniformly form the 
range [1, 21], 

ran gen. This generates a random graph G{n,p) with input parameters n p cl c2 
k d, where n is the number of nodes, p is the edge probability, capacities are 
in the range (cl, c2), k is the number of commodities and demands are in the 
range (0, d). 



Table 1. family sat_density: satgen -a 1000 -b i -cl 8 -c2 16 -k 10000 -d 8; 9967, 20076, 
59977, 120081, 250379, 500828 edges; 22, 61, 138, 281, 682, 1350 commodities; i is the 
expected percentage of pairs of vertices joined by an edge. 



program 


1 

cong. 


time 


2 

cong. 


time 


6 

cong. 


time 


12 

cong. time 


25 

cong. time 


50 

cong. time 


2alg 


1.40 


0 


1.50 


0 


1.63 


0 


1.63 


0 


1.75 


0 


1.75 


1 


2alg_h 


1.33 


0 


1.50 


0 


1.42 


0 


1.60 


0 


1.75 


0 


1.64 


0 


3al 


1.55 


1 


1.56 


1 


1.67 


4 


1.64 


9 


1.64 


23 


1.70 


54 


3al2 


1.45 


0 


1.67 


1 


1.78 


2 


1.57 


5 


1.67 


13 


1.75 


32 


3skut 


1.70 


0 


1.64 


1 


2.22 


1 


2.11 


1 


2.11 


2 


2.33 


6 


3skut_h 


1.70 


1 


1.64 


0 


2.22 


1 


2.11 


1 


2.11 


2 


2.33 


6 
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Table 2. family noi_commodities: noigen 1000 1 2 10 i; 7975 edges; 2-unbalanced 
family; i is the expected percentage of sinks in the non-source component. 



program 


2 

cong. 


time 


6 

cong. 


time 


12 

cong. time 


25 

cong. time 


50 

cong. time 


100 

cong. time 


2alg 


1.30 


0 


2.37 


0 


2.14 


0 


1.43 


0 


1.26 


0 


1.09 


0 


2alg_h 


1.30 


0 


1.73 


0 


2.14 


0 


1.49 


0 


1.17 


0 


1.09 


0 


3al 


2.21 


0 


1.73 


0 


2.41 


1 


1.70 


1 


1.26 


1 


1.14 


1 


3al2 


2.21 


0 


1.88 


0 


2.29 


0 


1.52 


1 


1.27 


1 


1.18 


1 


3skut 


1.40 


0 


1.65 


0 


2.15 


0 


1.85 


0 


1.45 


0 


1.49 


"T 


3skut_h 


1.40 


0 


1.65 


0 


2.15 


0 


1.85 


0 


1.45 


0 


1.49 


0 



Table 3. family rmLcommDem: genrmf -a 10 -b 64 -cl 64 -c2 128 -k i -d 128; 29340 
edges; 2-unbalanced family; i is the expected percentage of sinks among the vertices. 



program 


2 

cong. 


time 


5 

cong. 


time 


10 

cong. time 


20 

cong. time 


50 

cong. time 


70 

cong. time 


2alg 


2.71 


8 


1.77 


14 


1.41 


15 


1.21 


21 


1.09 


33 


1.06 


42 


2alg_h 


2.47 


8 


1.75 


14 


1.36 


15 


1.19 


19 


1.08 


31 


1.06 


42 


3al 


2.76 


6 


1.89 


13 


1.57 


23 


1.37 


49 


1.19 


138 


1.14 


196 


3al2 


2.79 


4 


1.89 


7 


1.54 


14 


1.30 


33 


1.16 


91 


1.15 


132 


3skut 


3.35 


1 


2.39 


1 


2.07 


2 


1.76 


1 


1.63 


2 


1.62 


3 


3skut_h 


3.35 


1 


2.39 


1 


2.07 


1 


1.76 


2 


1.63 


2 


1.62 


3 



satgen. It first generates a random graph G{n,p) as in rangen and then uses the 
following procedure to designate commodities. Two vertices s and t are picked 
from G and maximum flow is computed from s to t. Let v be the value of the 
flow. New nodes corresponding to sinks are incrementally added each connected 
only to t and with a randomly chosen demand value. The process stops when the 
total demand reaches v, the value of the minimum s-t cut or when the number 
of added commodities reaches the input parameter k, typically given as a crude 
upper bound. 



Table 4. family ran_dense: rangen -a i*2 -b 30 -cl 1 -c2 16 -k i -d 16; 1182, 4807, 19416, 
78156, 313660, 1258786 edges; 16-unbalanced family; i*2 is the number of vertices. 



program 


32 

cong. time 


64 

cong. time 


128 

cong. time 


256 

cong. time 


512 

cong. time 


1024 

cong. time 


2alg 


4.67 


0 


3.75 


0 


7.50 


0 


7.00 


0 


7.50 


0 


8.00 


0 


2alg_h 


2.50 


0 


3.25 


0 


6.50 


0 


7.00 


0 


7.50 


1 


7.50 


0 


3al 


4.67 


0 


8.00 


0 


8.00 




8.00 


Gi 


7.50 


30 


7.50 


136 


3al2 


4.67 


0 


6.50 


0 


7.50 


0 


8.50 


3 


7.50 


17 


8.00 


82 


3skut 


5.00 


0 


7.00 


0 


7.50 


1) 


o 

o 




7.50 


4 


7.50 


19 


3skut_h 


5.00 


0 


7.00 


0 


7.50 


0 


7.00 


1 


7.50 


4 


7.50 


18 
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5 Experimental Results 



In this section we give an overview of the experimental results. In all algorithms 
we study, starting with a different fractional flow may give different unsplittable 
solutions. Hence in order make a meaningful comparison of the experimental 
results of the SA against the results of the DGGA and KSA, we use the same 
initial fractional flow for all three. If the SA was used in isolation, one could use, 
as mentioned in Section 3, a min-cost flow algorithm to find the initial fractional 
flow and therefore obtain a best possible budget. 

The implementations follow the algorithm descriptions as given earlier. In 
the case of the SA, after finding the initial fractional flow /, one has to to iter- 
atively reduce flow, for each commodity i, along the most expensive s-t^-paths 
used by / until the inflow in terminal ti has been decreased by di — d{, where 
di stands for the rounded demand. Instead of doing this explicitly, as Skutella 
[19] suggests, we set the capacity of each edge e to /(e) and use an arbitrary 
min-cost flow algorithm to find a min-cost flow that satisfies the rounded de- 
mands. Because of this, the term 0{m?) in the running time of Algorithm 3 
in Section 3 can be replaced by the running time of an arbitrary min-cost flow 
algorithm. The running times of the currently best known min-cost flow al- 
gorithms are 0(nmlog(n^/m) log(nC)) [9], 0(nm(loglog [/) log(nC)) [1], and 
0((m log n)(m -I- n log n)) [18]. The code we use is again due to Cherkassky and 
Goldberg [4]. The experimental results for all the implementations are given in 
Tables 1-7. The wall-clock running time is given in seconds, with a running time 
of 0 denoting a time less than a second. We gave earlier the theoretical running 
times for the algorithms we implement but one should bear in mind that the 
real running time depends also on other factors such as the data structures used. 
Apart from standard linked and adjacency lists no other data structures were 
used in our codes. As mentioned in the introduction, speeding up the codes was 
not our primary focus. This aspect could be pursued further in future work. 

The DGGA vs. the KSA. We first compare the results of the 2- and the 3- 
approximation algorithms since they are both algorithms for congestion without 
costs. On a balanced input (see Table 1), the congestion achieved by the DGGA, 
with or without heuristics, was typically less than or equal to 1.75. The con- 
gestion achieved by the KSA was almost in the same range. For each balanced 
input, the difference in the congestion achieved by these two algorithms was not 
obvious, but the DGGA’s congestion was typically somewhat better. The obvious 
difference occurred in running time. Before starting measuring running time, we 
use the Cherkassky-Goldberg code kit to find a feasible splittable flow (if neces- 
sary we use other subroutines to scale up the capacities by the minimum amount 
needed to achieve a fractional congestion of 1), and then we create an array of 
nodes to represent this input graph (discarding all zero flow edges) and delete 
all flow cycles. After that, we start measuring the running time and applying the 
DGGA. The starting point for measuring the running time in the implementation 
of the KSA is also set after the termination of the Gherkassky-Goldberg code. 
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To test the robustness of the approximation guarantee we relaxed, on several 
instances, the balance assumption allowing the maximum demand to be twice 
the minimum capacity or more, see Tables 2-6. The approximation guarantee 
of each individual algorithm was not significantly affected. Even in the extreme 
case when the balance assumption was violated by a factor of 16, as in Table 4, 
the code 2a Ig achieved 8 and the code 2alg_h achieved 7.5. Relatively speaking 
though the difference in the congestion achieved between 2alg, 2alg_h and Sal, 
3al2 is much more pronounced compared to the inputs with small imbalance. See 
the big differences in first two columns of Table 4 (in Column 2, 2alg: 3.75 and 
3al2: 6.5). Hence the DGGA is more robust against highly unbalanced inputs. 
This is consistent with the behavior of the KSA which keeps increasing the edge 
capacities by a fixed amount in each iteration before routing any new flow. In 
contrast, the DGGA increases congestion only when some flow is actually rerouted 
through an edge. As shown in Table 4, the SA which behaves similarly to the 
KSA exhibits the same lack of robustness for highly unbalanced inputs. 

We also observed that the benefit of the heuristic used in our 2alg_h imple- 
mentation showed up in our experimental results. For most of the inputs, the 
congestion was improved, although rarely by more than 5%. Some significant 
improvements were observed when the input was very unbalanced, see Table 4. 
Theoretically, the running time for the program with the heuristic should in- 
crease by a certain amount. But in our experiments, the running time stayed 
virtually the same. This phenomenon was beyond what we expected. 

In summary, the DGGA performs typically better than the KSA for conges- 
tion. The average improvement for Tables 1-4 is 6.5%, 6.3%, 9.4%, and 53% 
with the occasional exception where the KSA outperformed the DGGA. This 
behavior is consistent with the fact that the DGGA is a theoretically better algo- 
rithm. Moreover the theoretical advantage typically translates to a much smaller 
advantage in practice. 

The difference in the running time for these two approximation algorithms 
was fairly significant in our experiments especially for dense graphs with a large 
number of commodities. The DGGA runs much faster than the implementation of 
the KSA we used. We proceed to give two possible reasons for this phenomenon. 

The first reason is the difference in complexity for these two implementations. 
Recall that the running time of the DGGA is 0{mn + km) and the running time 
of the implementation of the KSA that we used is 0{k{dmax/dmin)iTi) [14,15]. 
We emphasize that a polynomial-time implementation is possible (see [14,15]). 
In fact Skutella’s Power-Algorithm can be seen as a much more efficient 
implementation of essentially the same algorithm. A second reason is that the 
DGGA processes the graph in a localized manner, i.e., finding an alternating 
cycle locally and increasing a certain amount of flow on it, while the Sal, 3al2 
codes repeatedly compute maximum flows on the full graph. 

The Skutella algorithm. We now examine the congestion achieved by the SA. On 
a balanced input (see Table 1), the congestion achieved by the SA was typically 
greater than 1.64 and less than or equal to 2.33. This range of congestion is bigger 
than the range [1.33,1.75] achieved by the DGGA and the range [1.45,1.78] by 
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Table 5. family noi_components: noigen 1000 1 i 10 50; 7975 edges; 2-unbalanced 
family; i is the number of components. 



program 


3 

cong. 


time 


6 

cong. 


time 


12 

cong. time 


24 

cong. time 


48 

cong. time 


2alg 


1.10 


0 


1.07 


0 


1.04 


0 


1.01 


0 


1.01 


0 


2alg_h 


1.12 


0 


1.05 


0 


1.02 


0 


1.01 


0 


1.01 


0 


Sal 


1.22 


2 


1.13 


1 


1.12 


1 


1.07 


1 


1.08 


1 


3al2 


1.22 


1 


1.13 


0 


1.11 


1 


1.04 


1 


1.09 


1 


Sskut 


1.46 


0 


1.43 


0 


1.37 


0 


1.37 


0 


1.38 


0 


3skut_h 


1.46 


0 


1.43 


0 


1.37 


0 


1.37 


0 


1.38 


0 



Table 6. family rmLdepthDem: genrmf -a 10 -b i -cl 64 -c2 128 -k 40 -d 128; 820, 
1740, 7260, 29340 edges; 2-unbalanced family; i is the number of frames. 



program 


2 

cong. 


time 


4 

cong. 


time 


16 

cong. time 


64 

cong. time 


2alg 


2.30 


0 


2.47 


0 


1.36 


1 


1.11 


28 


2alg_h 


1.89 


0 


2.64 


0 


1.37 


1 


1.09 


27 


Sal 


2.26 


0 


2.18 


0 


1.48 


5 


1.21 


103 


3al2 


1.82 


0 


2.72 


0 


1.46 


3 


1.20 


77 


Sskut 


2.72 


0 


2.71 


0 


1.87 


1 


1.72 


1 


Sskut _h 


2.72 


0 


2.71 


0 


1.87 


0 


1.72 


2 



the KSA. More precisely, the absolute difference in congestion between the SA 
and the DGGA or KSA is on average around 0.4. We think that this nontrivial 
difference in congestion is partially or mainly caused by the involvement of costs 
on the edges and the simultaneous performance guarantee of 1 for cost of the 
SA. The constraint that the flow found at each step should not increase the cost 
limits the routing options. 

In the implementation of the (3, l)-approximation algorithm we start mea- 
suring the running time just before applying a min-cost flow algorithm [4] to find 
a min-cost flow for the rounded demands. Before that starting point of running 
time, we use the Cherkassky-Goldberg code kit to find a feasible splittable flow 
(if necessary, as we did before, we use other subroutines to scale up the capacities 
by the minimum amount to get the optimal fractional congestion), and then we 
create the input data for the min-cost flow subroutine, i.e., setting the capacity 
of each edge to its flow value and the demand of each commodity i to di. For 
balanced input instances in Table 1, the running time of the SA is much better 
than that of the KSA but slightly more than that of the DGGA. Actually, as we 
observed in testing, most of the running time for the SA is spent in finding the 
initial min-cost flow. 

To test the robustness of the approximation guarantee achieved by the SA we 
used the instances with the relaxed balanced assumption. Even in the extreme 
case when the balance assumption was violated by a factor of 16, as in Table 4, 
the code Sskut achieved 7.50. The absolute difference in congestion achieved by 
the codes 2alg, Sal and Sskut is typically small. The only big difference occurred 
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Table 7. Effect of our heuristic on the SA. Here the input instances in Columns 1 to 
6 are the modified input instances in the last columns of Tables 1 to 6 whose original 
demands, denoted by d, are modified as follows to the value d!\ d' = 1 if d = 2; d' = 2^ 
if d = 3; d' = 2^ if 8 < d < 16; d' = 2® if 32 < d < 64; for all other cases, d are 
not changed. Note that the maximum demand value in our input instances is equal to 
128 = 2\ 



program 


1 

cong. time 


2 

cong. time 


3 

cong. time 


4 

cong. time 


5 

cong. time 


6 

cong. time 


3skut 

3skut_h 


2.22 6 

2.22 6 


1.26 0 

1.28 0 


1.50 3 

1.48 3 


5.33 18 

5.33 18 


1.19 0 

1.20 0 


1.61 2 

1.58 3 



in the second output column in Table 4 (2alg: 3.75, Sal: 8.00 and Sskut: 7.00). 
However, similar to the output in Table 1, the congestion achieved by the codes 
2alg and Sal for an unbalanced input was typically better, see Tables 2-6. Given 
the similarities between the KSA and SA the reason is, as mentioned above, the 
involvement of costs on the edges and a simultaneous performance guarantee of 

1 for cost in the (3, l)-approximation algorithm. For the running time, things 
are different. We can see from Tables 3 and 6 that the code Sskut runs much 
faster than 2alg and Sal when the size of the input is large. This is probably 
because after the rounding stage the number of the distinct rounded demand 
values, which is the number of iterations in our implementation, is small (equal 
to 7 in Tables 3 and 6) and the number of augmenting cycles (to be chosen 
iteratively) in most of the iterations is not very large. If this is the case, the 
execution of these iterations could be finished in a very short period of time and 
the total running time is thus short also. 

Effect of the heuristic on the SA. No benefit of the heuristic used in our 3skut_h 
implementation showed in Tables 1-6. This is because in each iteration (except 
the stage of finding a min-cost flow) the non-zero remainder of flow value on 
each edge with respect to the rounded demand value of the current iteration is 
exactly the same in our input instances. More precisely, in our input instances, 
the variable 5j adopts all values dmin • 2* between dmin and dmaxi and in this 
case, in iteration j the remainder of flow value on any edge with respect to 
hj = dmin • 2^“^ is either dmin • 2'’“^ or 0. So the amount of augmented flow along 
an augmenting cycle C is dmin • 2^“^ and after the augmentation the flow on 
each edge of C is dj -integral and thus all edges of C will not be involved in the 
remaining augmentation procedure of this iteration. This is also probably the 
reason why sometimes the SA runs faster than the DGGA. When the variable 5j 
would not adopt all values dmin ■ 2* between dmin and dmax> the heuristic proved 
to be of some marginal usefulness. This can be seen from Table 7. The congestion 
was improved in Columns 3 and 6 by 0.02 and 0.03, respectively, but in Columns 

2 and 5 the congestion increased by 0.02 and 0.01, respectively. 

In summary, in most of our experiments the DGGA and KSA achieved lower 
congestion than the SA. Relative gains of the order of 35% or more are common 
especially for Tables 1, 3 and 4. This is mainly because the SA has a simultaneous 
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performance guarantee for the cost. The SA remains competitive and typically 

achieved approximation ratios well below the theoretical guarantee. The 3skut 

code runs much faster than Sal and occasionally faster than the 2alg code. 
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Abstract. In this work, we propose two variants of K-d trees where 
fingers are used to improve the performance of orthogonal range search 
and nearest neighbor queries when they exhibit locality of reference. The 
experiments show that the second alternative yields significant savings. 
Although it yields more modest improvements, the first variant does it 
with much less memory requirements and great simplicity, which makes 
it more attractive on practical grounds. 



1 Introduction 

The well-known, time-honored aphorism says that “80% of computer time in 
spent on 20% of the data”. The actual percentages are unimportant but the 
moral is that we can achieve significant improvements in performance if we 
are able to exploit this fact. In on-line settings, where requests arrive one at 
a time and they must be attended as soon as they arrive (or after some small 
delay) , we frequently encounter locality of reference, that is, for any time frame 
only a small number of different requests among the possible ones are made or 
consecutive requests are close to each other in some sense. Locality of reference is 
systematically exploited in the design of memory hierarchies (disk and memory 
caches) and it is the rationale for many other techniques like buffering and self- 
adjustment [2,11]. 

The performance of searches and updates in data structures can be improved 
by augmenting the data structure with fingers, pointers to the hot spots in the 
data structure where most activity is going to be performed for a while (see for 
instance [9,3]). Thus, successive searches and updates do not start from scratch 
but use the clues provided by the finger(s), so that when the request affects some 
item in the “vicinity” of the finger(s) the request can be attended very efficiently. 

To the best of the authors’ knowledge, fingering techniques haven’t been 
applied so far to multidimensional data structures. In this paper, we will specif- 
ically concentrate in two variants of A-dimensional trees, namely standard K-d 
trees [1] and relaxed K-d trees [6], but the techniques can easily be applied to 
other multidimensional search trees and data structures. In general, multidimen- 
sional data structures maintain a collection of items or records, each holding a 
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programme of the EU under contract IST-1999-14186 (ALCOM-FT) and the Spanish 
Min. of Science and Technology project TIC2002-00190 (AEDRI II). 
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distinct iti-dimensional key (which we may assume w.l.o.g. is a point in [0, 1]*^). 
Also, we will identify each item with its key and use both terms interchangeably. 
Besides usual insertions, deletions and (exact) search, we will be interested in 
providing efficient answers to questions like which records do fall within a given 
hyper-rectangle Q (orthogonal range search) or which is the closest record to 
some given point q (nearest neighbor search) [10,8]. After a brief summary of 
basic concepts, definitions and previous results in Section 2, we propose two alter- 
native designs that augment K-d trees^ with fingers to improve the efficiency of 
orthogonal range and nearest neighbor searches (Section 3). We thereafter study 
their performance under reasonable models which exhibit locality of reference 
(Section 4). While it seems difficult to improve the performance of multidimen- 
sional data structures using self-adjusting techniques (as reorganizations in this 
type of data structures is too expensive), fingering yields significant savings and 
it is easy to implement. Our experiments show that the second, more complex 
scheme of m-finger K-d trees exploits better the locality of reference than the 
simpler 1-finger K-d trees; however these gains probably do not compensate for 
the amount memory that it needs, so that 1-finger K-d trees are more attractive 
on a practical ground. 



2 Preliminaries and Basic Definitions 

A standard K-d tree [1] for a set F of AT-dimensional data points is a binary 
tree in which: (a) each node contains a AT-dimensional data point and has an 
associated discriminant j G {0, 1, . . . , AT — 1} cyclically assigned starting with 
j = 0. Thus the root of the tree discriminates with respect to the first coordinate 
(j = 0), its sons at level 1 discriminate w.r.t. the second coordinate (j = 1), and 
in general, all nodes at level m discriminate w.r.t. coordinate j = m mod AT; (b) 
for each node x = (xq, xi, . . . , xk-i) with discriminant j, the following invariant 
is true: any data point y in the left subtree satisfies yj < xj and any data point 
z in the right subtree satisfies zj > Xj (see Fig. 1). 

Randomized relaxed K-d trees [6] (relaxed AT-d trees, for short) are AT-d trees 
where the sequence of discriminants in a path from the root to any leaf is random 
instead of cyclic. Hence, discriminants must be explicitly stored in the nodes. 
Other variants of AT-d tree use different alternatives to assign discriminants (for 
instance, the squarish K-d trees of Devroye et al. [5]) or combine different meth- 
ods to partition the space (not just axis-parallel hyper-planes passing through 
data coordinates as standard and relaxed A"-d trees do) . 

We say that a AT-d tree of size n is random if it is built by n insertions where 
the points are independently drawn from a continuous distribution in [0,1]^. 
In the case of random relaxed AT-d trees, the discriminants associated to the 
internal nodes are uniformly and independently drawn from {0, ... , AT — 1}. 

^ They actually apply to any variant of A'-d trees, not just the two mentioned; some 
additional but minor modifications would be necessary to adapt them to quad trees, 
A'-d tries, etc. 
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Fig. 1. A standard 2-d tree and the corresponding induced partition of [0, 1]^ 



An (orthogonal) range query is a AT-dimensional hyper-rectangle Q. We shall 
write Q = [Iq, uq] x • • • x [^K-l^UK-l\, with < Ui, for 0 < i < K . Alternatively, 
a range query can be specified given its center ^ and the length of its edges 
Aq, Ai, . . . Ak-i, with 0 < < 1/2, for 0 < i < AT. We assume that —Aij2 < 

Zi <1 + Ai/2, for 0 < z < AT, so that a range query Q may fall partially outside 
of [0, 1]^. 

Range searching in any variant of Al-d trees is straightforward. When visiting 
a node x that discriminates w.r.t. the j-th coordinate, we must compare Xj 
with the j-th range [£j,Uj] of the query. If the query range is totally above 
(or below) that value, we must search only the right subtree (respectively, left) 
of that node. If, on the contrary, ij < Xj < Uj then both subtrees must be 
searched; additionally, we must check whether x falls or not inside the query 
hyper-rectangle. This procedure continues recursively until empty subtrees are 
reached. 

One important concept related to orthogonal range searches is that of hound- 
ing box or bounding hyper-rectangle of a data point. Given an item a; in a AT-d tree 
T, its bounding hyper-rectangle B{x) = [(.q{x),uq{x)\ x . . . x [iK-i{x),UK-i{x)\ 
is the region of [0,1]^ corresponding to the leaf that x replaced when it was 
inserted into T. Formally, it is defined as follows: a) if x is the root of T then 
B{x) = [0,1]*-; b) if z/ = {yo,--- ,yK-i) is the father of x and it discrim- 
inates w.r.t. the j-th coordinate and Xj < yj then B{x) = [I'o(?/)j '^o(y)] x 
••• X [^j(j/)>%] X ... X [iK-i{y),UK-i{y)]] c) a y = {yo,-- - ,yK-i) is the fa- 
ther of X and it discriminates w.r.t. the j-th coordinate and Xj > yj then 
B{x) = [(o{y),uo{y)] X ... X [yj,Uj{y)] x . . . x [eK-i{y),UK-i{y)]- The relation 
of bounding hyper-rectangles with range search is established by the following 
lemma. 

Lemma 1 ([4,5]). A point x with hounding hyper-rectangle B{x) is visited by 
a range search with query hyper-rectangle Q if and only if B{x) intersects Q. 

The cost of orthogonal range queries is usually measured as the number of 
nodes of the AT-d tree visited during the search. It has two main parts: a term 
corresponding to the unavoidable cost of reporting the result plus an overwork. 
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Theorem 1 ([7]). Given a random K-d tree storing n uniformly and inde- 
pendently drawn data points in [0,1]^, the expected overwork "E\Wn], of an or- 
thogonal range search with edge lengths Aq,... ,Ax-i such that Ai ^ Q as 
n oo, Q < i < K , and with center uniformly and independently drawn from 
Za = [—Aq/2, 1 + Aq/2] X • • • X [—Ak-i/2., 1 + Ak-i/ 2] is given by 

^[Wn]= X! Cj ■ -i-2- {1- Ao)- ■■ {1 - Ak-i) - logn-i- 0{1), 

1<]<K 

where a{x) and the Cj ’s depend on the particular type of K-d trees. 

In the case of standard K-d trees a{x) = 1 — a; + (fix), where <f{x) is the 
unique real solution of {(f{x) + 3 — x)^{(f{x) + 2 — x)^^~^'> —2 = 0; for any 
X G [0,1], we have (f{x) < 0.07. For relaxed K-d trees, a{x) = 1 — a; + (f{x) 
where (fix) = (-\/9 — 8a; — 3)/2 + X. Of particular interest is the case where 
Ai = Oin~^!^) corresponding to the situation where each orthogonal range 
search reports a constant number of points on the average. Then the cost is 
dominated by the overwork and we have 

E [i?„] = E [IT„] = + . . . + ^ 

A nearest neighbor query is a multidimensional point q= iqo,qi, . . . , qx-i) 
lying in [0, 1]*^. The goal of the search is to find the point in the data structure 
which is closest to q under a predefined distance measure. 

There are several variants for nearest neighbor searching in K-d trees. One 
of the simplest, which we will use for the rest of this paper works as follows. 
The initial closest point is the root of the tree. Then we traverse the tree as if 
we were inserting q. When visiting a node x that discriminates w.r.t. the j-th 
coordinate, we must compare qj with Xj. If qj is smaller than Xj we follow the left 
subtree, otherwise we follow the right one. At each step we must check whether 
X is closer or not to q than the closest point seen so far and update the candidate 
nearest neighbor accordingly. The procedure continues recursively until empty 
subtrees are reached. If the hyper-sphere, say Bq, defined by the query q and 
the candidate closest point is totally enclosed within the bounding boxes of the 
visited nodes then the search is finished. Otherwise, we must visit recursively 
the subtrees corresponding to nodes whose bounding box intersects but does not 
enclose Bq. 

The performance of nearest neighbor search is similar to the overwork in 
range search. Given a random K-d tree, the expected cost E [A^A^n] of a nearest 
neighbor query q uniformly drawn from [0, 1]^ is given by 

E [A^A^n] = Oin^ d- logn), 

where p = maxo<s<ic(a(s/AT) -l-\-s/K). In the case of standard K-d trees p £ 
(0.0615,0.064). More precisely, for A = 2 we have p= (vTZ — 4)/2 « 0.0615536 
and for AT = 3 we have p ~ 0.0615254, which is minimal. For relaxed K-d trees 
p G (0.118,0.125). When K = 2 we have p = ^ — 1 « 0.118, which is minimal, 
whereas for K = 8 we have p = \, which is maximal. 
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3 Finger K-d Trees 

In this section we introduce two different schemes of fingered K-d trees. We call 
the first and simpler scheme 1-finger K-d trees; we augment the data structure 
with one finger pointer. The second scheme is called multiple finger K-d tree (or 
m-finger K-d tree, for short). Each node of the new data structure is equipped 
with two additional pointers or fingers, each pointing to descendent nodes in 
the left and right subtrees, respectively. The search in this case proceeds by 
recursively using the fingers whenever possible. 



3.1 One-Finger K-d Trees 

A one-finger K-d tree (1-finger K-d tree) for a set F of AT-dimensional data 
points is a K-d tree in which: (a) each node contains its bounding box and 
a pointer to its father; (b) there is a pointer called finger that points to an 
arbitrary node of the tree. 

The finger is initially pointing to the root but it is updated after each in- 
dividual search. Consider first orthogonal range searches. The orthogonal range 
search algorithm starts the search at some node x pointed to by the finger /. 
Let B{x) be the bounding box of node x and Q the range query. If Q C B{x) 
then all the points to be reported must necessarily be in the subtree rooted 
at X. Thus, the search algorithm proceeds from x down following the classical 
range search algorithm. Otherwise, some of the points that are inside the query 
Q can be stored in nodes which are not descendants of x. Hence, in this situ- 
ation the algorithm backtracks until it finds the first ancestor y oi x such that 
B{y) completely contains Q. Once y has been found the search proceeds as in 
the previous case. The finger is updated to point to the first node where the 
range search must follow recursively into both subtrees (or to the last visited 
node if no such node exists). In other terms, / is updated to point to the node 
whose bounding box completely contains Q and none of the bounding boxes of 
its descendants does. The idea is that if consecutive queries Q and Q' are close 
in geometric terms then either the bounding box B{x) that contains Q does also 
contain Q' or only a limited amount of backtrack suffices to find the appropriate 
ancestor y to go on with the usual range searching procedure. Of course, the 
finger is initialized to point to the tree’s root before the first search is made. Al- 
gorithm 1 describes the orthogonal range search in 1-finger K-d trees. It invokes 
the standard range_search algorithm once the appropriate starting point has 
been found. We use the notation p — >• field to refer to the field field in the node 
pointed to by p. For simplicity, the algorithm assumes that each node stores its 
bounding box; however, it is not difficult to modify the algorithm so that only 
the nodes in the path from the root to / contain this information or to use an 
auxiliary stack to store the bounding boxes of the nodes in the path from the 
root to the finger. Additionally, the explicit pointers to the father can be avoid 
using pointer reversal plus a pointer to finger’s father. 

The single finger is exploited for nearest neighbor searches much in the same 
vein. Let be q the query and let x be the node pointed to by the finger /. Initially 
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Algorithm 1 The orthogonal range search algorithm for 1-finger K-d trees. 
I> Initial call: / := one_f inger_range_search(/, Q) 
function one_f inger_range_search(/, Q) 
if f — nil then retnrn / 

B •.= f ^ boundinghox 
if Q gL B then l> Backtrack 

return one_f inger_range_search(/ — >■ father, Q) 
a; := / — >■ info\j f discr 
if Q.u[j\ < x[j\ then 

return one_f inger_range_search(/ — >■ left,Q) 
if Q-l\j] > x[j] then 

return one_f inger_range_search(/ — ^ right, Q) 
if X G Q then Add x to result 
range_search(/ — >■ left, Q) 
range_search(/ — > right, Q) 
return / 
end 



/ will point to the root of the tree, but on successive searches it will point to the 
last closest point reported. The first step of the algorithm is then to calculate the 
distance d between x and q and to determine the ball with center q and radius d. 
If this ball is completely included in the bounding box of x then nearest neighbor 
search algorithm proceeds down the tree exactly in the same way as the standard 
nearest neighbor search algorithm. If, on the contrary, the ball is not included in 
B{x), the algorithm backtracks until it finds the least ancestor y whose bounding 
box completely contains the ball. Then the algorithm continues as the standard 
nearest neighbor search. Algorithm 2 describes the nearest neighbor algorithm 
for 1-finger K-d trees; notice that it behaves just as the standard nearest neighbor 
search once the appropriate node where to start has been found. 



3.2 Multiple Finger K-d Trees 

A multiple-finger K-d tree (m-finger K-d tree) for a set F of iT-dimensional data 
points is a K-d tree in which each node contains its bounding box, a pointer to 
its father and two pointers, f left and fright, pointing to two nodes in its left 
and right subtrees, respectively. 

Given a m-finger K-d tree T and an orthogonal range query Q the orthogonal 
range search in T returns the points in T which fall inside Q as usual, but it 
also modifies the finger pointers of the nodes in T to improve the response time 
of future orthogonal range searches. The algorithm for m-finger search trees 
follows by recursively applying the 1-finger K-d tree scheme at each stage of the 
orthogonal range search trees. The fingers of visited nodes are updated as the 
search proceeds; we have considered that if a search continues in just one subtree 
of the current node the finger corresponding to the non-visited subtree should 
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Algorithm 2 The nearest neighbor search algorithm for 1-finger K-d trees. 
> Initial call: nn := T — >■ root;md := dist(nn — >■ info,q); 

[> / := one_f inger_NN(/, g, md, nn) 

function one_f inger_NN(/, q, minjdist, nn) 
if / = nil then retnrn / 

X := f ^ info 
d := dist(g, x) 

B ■.= f ^ bounding J)ox 
if d < min-dist then 
minjdist ~ d 
nn ■- f 

if BALL(q, min-dist) (f. B then > Backtrack 

retnrn one_f inger_NN(/ — >■ father, q,min_dist,nn) 
j '■= f ^ discr 
if q\j] < x\j] then 

nn := one_f inger_NN(/ — >■ left,q,min_dist,nn) 
other :=/—>■ right 
else 

nn := one_f inger_NN(/ — >■ right, q,min_dist,nn) 
other := f ^ left 

if <l[j] — min_dist < x[j] and q[j] + min_dist > x[j] then 
nn := one_f inger_NN(ot/ier, g, min_dist, nn) 

return nn 
end 



be reset, for it was not providing useful information. The pseudo-code for this 
algorithm is given as Algorithm 3. 

It is important to emphasize that while it is not too difficult to code 1-finger 
search trees using O(logn) additional memory^, the implementation of m-finger 
search trees does require 0(n) additional memory for the father pointer, finger 
pointers and bounding boxes, that is, a total of 3n additional pointers and 2n 
A-dimensional points. This could be a high price for the improvement in search 
performance which, perhaps, might not be worth paying. 

4 Locality Models and Experimental Results 

Both 1-finger and m-finger try to exploit locality of reference in long sequences 
of queries, so one of the main aspects of this work was to devise meaningful 
models on which we could carry out the experiments. The programs used in the 
experiments described in this section have been written in C, using the GNU 
compiler gcc-2 .95.4. The experiments themselves have been run in a computer 
with Intel Pentium 4 CPU at 2.8 GHz with 1 Gb of RAM and 512 Kb of cache 
memory. 

^ Actually, the necessary additional space is proportional to the height of the K-d 
tree, which on average is 0(logn) but can be as much 0(n) in the worst-case. 
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Algorithm 3 The orthogonal range search algorithm in a m-finger K-d tree. 
I> Initial call: multiple_f inger_range_search(T — >■ root,Q) 
function nmltiple_f inger_range_search(/, Q) 
if f — nil then retnrn / 

B •.= f ^ boundinghox 
if Q B then l> Backtrack 
/ ^ fleft ~f^ left 
f — >■ fright :=/—>■ right 

return multiple_f inger_range_search(/ — >■ father, Q) 

X := f ^ info 
if Q.u[j] < x[j] then 

/ — >■ fright := / — >■ right 

f — >■ fleft := multiple_f inger_range_search(/ — >■ fleft, Q) 
if / fleft = nil then return / 
if / -^ fleft = / then / fleft :=/->■ left 
return T — >■ fleft 
if Q-l\j] > x\j\ then 
/ ^ fleft := / ^ left 

f — >■ fright := multiple_f inger_range_search(/ — >■ fright, Q) 
if / ^ fright = nil then return / 
if / — >■ fright = / then / — >■ fright := / — ^ right 
return T — >■ fright 
if X € Q then Add x to result 

/ — >■ fleft multiple_f inger_range_search(/ fleft, Q') 

if f ^ fleft = nil then return / 
if / -5^ fleft = f then / fleft ■- f left 
f — > fright := multiple_f inger_range_search(/ — ^ f right, Q”) 
if / — > fright — nil then return / 
if / — ^ fright — f then / — >■ fright :=/—>■ right 
return / 
end 



4.1 The Models 

In the case of orthogonal range search, given a size n, and a dimension K, 
we generate T = 1000 sets of n A-dimensional points drawn uniformly and 
independently at random in [0, 1]^. Each point of each set is inserted into two 
initially empty trees, so that we get a random standard K-d tree Tg and a 
random relaxed K-d tree of size n which contain the same information. For 
each pair (Ts,Tr), we generate S = 300 sequences of Q = 100 orthogonal range 
queries and make the corresponding search with the standard and the fingered 
variants of the algorithm, collecting the basic statistics on the performance of 
the search. 

We have used in all experiments fixed size queries: the length of the K edges 
of each query was A = 0.01. Since we have run experiments with up to n = 50000 
elements per tree, the number of reported points by any range search is typically 
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small (from 1 to 5 reported points). To modelize locality, we introduced the 
notion of i5-close queries: given to queries Q and Q' with identical edge lengths 
Aq, Ai, ... , Ax-i, we say that Q and Q' are (5-close if their respective centers 
0 and z' satisfy z — z' = {do, d\, . . . , dK-i) and \dj | < 5 • Aj, for any 0 < j < K. 
The sequences of (5-close queries were easily generated at random by choosing the 
initial center zq at random and setting each successive center Zm+i = Zm + dm 
for some randomly generated vector dm', in particular, the t-th coordinate of dm 
is generated uniformly at random in [—(5 ■ Aj, 5 ■ Aj\. 

The real- valued parameter (5 is a simple way to capture into a single number 
the degree of locality of reference. If (5 < 1 then (5-close queries must overlap at 
least a fraction (1 — (5)^ of their volume. When (5 — >■ oo (in fact it suffices to set 
5 = max{Z\~^}) we have no locality. 

For nearest neighbor searches, the experimental setup was pretty much the 
same as for orthogonal search; for each pair (T^, T^) of randomly built K-d trees, 
we perform nearest neighbor search on each of the Q = 100 queries of each of 
the S = 300 generated sequences. 

Successive queries q and q' are said to be (5-close if q — q' = {do, di, . . . , dx-i) 
and \dj\ < (5, for any 0 < j < A". It is interesting to note that the locality 
of reference parameter (5 now bounds the distance between queries in absolute 
terms, whereas it is used in relative terms in the case of orthogonal range queries. 
As such, only values in the range [0, '/K] are meaningful, although we find 
convenient to say (5 — >■ oo to indicate that there is no locality of reference. 



4.2 The Experiments 

Range Queries. Due to the space limitations, we show here only the results 
corresponding to relaxed K-d trees; the results for standard K-d trees are quali- 
tatively identical. To facilitate the comparison between the standard algorithms 
and their fingered counterparts we use the ratio of the respective overworks; 
namely, if denotes the overwork of 1-finger search, denotes the over- 

work of TO-finger search and denotes the overwork of standard search (no 
fingers), we will use the ratios /Wn^^ and jWn^ . Recall 

that the overwork is the number of visited nodes during a search minus the num- 
ber of nodes (points) which satisfied the range query. The graphs of Figures 2, 
3 and 4 depict Tn ^ and and 5 = 0.25,0.75,2, respectively. 

All the plots confirm that significant savings can be achieved thanks to the 
use of fingers; in particular, m-finger K-d trees do much better than 1-finger K- 
d trees for all values of K and 5. As 6 increases the savings w.r.t. non-fingered 
search decrease, but even for 5 = 2 the overwork of 1-finger search is only about 
60% of the overwork of the standard search. 

As we already expected, the performance of both 1-finger K-d trees and m- 
finger K-d trees heavily depends on the locality parameter 5, a fact that is well 
illustrated by Figures 5 and 6. The first one shows the plot of the overwork 
of standard m-finger K-d trees for various values of 5 and dimensions. In 
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particular, when the dimension increases we shall expect big differences in the 
savings that fingered search yield as <5 varies; for lower dimensions, the variability 
of with 5 is not so “steep” . Similar phenomena can be observed for relaxed 
m-finger K-d trees and standard and relaxed 1-finger K-d trees. On the other 
hand. Figure 6 shows the variation of and as functions of S for 

relaxed 1 -finger and m-finger K-d trees. 




Fig. 2. Overwork ratios for relaxed 1-finger K-d trees {solid line) and m-finger K-d 
trees {dashed line), for <5 = 0.25, K = 2 {up left), K = 3 {up right), K = 4 {down left), 
and K — 5 {down right) 



Taking into account the way the algorithm works, we conjecture that 1-finger 
search reduces by a constant factor the logarithmic term in the overwork. Thus, 
if standard search has overwork « 'l2o<j<K + ylogn then 

^ ^ j' logn, ( 1 ) 

0<j<K 

with 7 ' = 7 ' (5). However, since the </>’s and (3’s are quite small it is rather difficult 
to disprove this hypothesis on an experimental basis; besides it is fully consistent 
with the results that we have obtained. 

On the other hand, and again, following our intuitions on its modus operandi, 
we conjecture that the overwork of m-finger search is equivalent to skipping 
the initial logarithmic path and then performing a standard range search on a 
random tree whose size is a fraction of the total size, say n/x, for some x > 1 
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(basically, the m-finger search behaves as the standard search, but skips more or 
long intermediate chains of nodes and their subtrees). In other words, we would 
have 



^ Y, /3>^^^'/^^+7'logn, (2) 

0<j<K 

for some /3' ’s and 7' which depend on S (but 7' here is not the same as for 1-finger 
search). In this case we face the same problems in order to find experimental 
evidence against the conjecture. 

Table 1 summarizes the values of /3 = /3i and 7 that we obtain by finding 
the best-fit curve for the experimental results of relaxed 2-d trees. It is worth to 
recall here that the theoretical analysis in [7] predicts for the overwork of 
standard search in relaxed 2-d trees the following values: (j){l/2) = (-\/5 — l)/2 « 
.618033989, /3 = 4A(1 - A) ~ 0.03828681802 and 7 = 2(1 - Af = 

1.9602. 



Table 1. Best-fit (3 and 7 for relaxed 2-d trees 
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1.103 


0.039 


0.908 


2.00 
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1.151 


0.039 
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Nearest Neighbor Queries. The curves in Figure 7 show the performance 
of relaxed 1-finger K-d trees. There, we plot the ratio of the cost using 1-finger 
nearest neighbor search to the cost using no fingers. For each dimension K 
{K = 2, 3,4, 5), the solid line curve corresponds to nearest neighbor search with 
S = 0.01, whereas the dashed line curve corresponds to <5 = 0.005. 

It is not a surprise that when we have better locality of reference (a smaller 
6) the performance improves. It is more difficult to explain why the variability 
on 6 is smaller as the dimension increases. The qualitatively different behavior 
for K = 2, K = 3 and AT > 3 is also surprising. For K = 2 the ratio of the 
costs increases as n increases until it reaches some stable value (e.g., roughly 
90% when S = 0.005). For K = 3 we have rather different behavior when we 
pass from <5 = 0.005 to <5 = 0.01. For K = 4, K = 5 and K = 6 we have the same 
qualitative behavior in all cases^: a decrease of the ratio as n grows until the 

The plot for A = 6 is not shown in the figure. 
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ratio reaches a limit value. A similar phenomenon occurs for K = 2 and AT = 3 
provided that 6 is even smaller than 0.005. 

We did not find significant improvements of 1-finger search with respect to 
standard search in none of our experiments, in particular, the cost of 1-finger 
nearest neighbor search was not below 90% of the standard cost even for large 
dimensions and small (but not unrealistic) 5’s. 

A preliminary explanation for the observed behavior is that unless the locality 
of reference is high then the NN search will have to backtrack a significant 
fraction of the path that it had skipped thanks to the finger. On the other 
hand, we define the locality parameter S in an absolute manner, but the actual 
degree of locality that it expresses depends on the dimension. For instance, take 
n = 10000 and S = 0.01. If K = 2 then we would expect to find one point in 
a Loo“Sphere of radius (5; but we would find only 0.01 points in a Loo-sphere of 
radius i5 if AT = 3, etc. Hence, many NN searches in a large dimension and with 
some reasonably small value of S will have the same result as the immediately 
preceding query and 1-finger search does pretty well under that (easy) situation. 




Fig. 7. Nearest neighbor queries in relaxed 1-finger K-A trees for 5 = 0.01 {solid line) 
and S — 0.005 {dashed line), K — 2 {up left), K — 3 {up right), K = 4 {down left), and 
A = 5 {down right) 
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Abstract. A homogeneous set is a non-trivial, proper subset of a graph’s 
vertices such that all its elements present exactly the same outer neigh- 
borhood. Given two graphs, Gi(U, Ei), G2{V, E2), we consider the prob- 
lem of finding a sandwich graph Gs{V, Es), with Ei C Es C E2, which 
contains a homogeneous set, in case such a graph exists. This is called 
the Homogeneous Set Sandwich Problem (HSSP). We give an 0(n® ®) 
deterministic algorithm, which updates the known upper bounds for this 
problem, and an 0(n®) Monte Carlo algorithm as well. Both algorithms, 
which share the same underlying idea, are quite easy to be implemented 
on the computer. 

1 Introduction 

Given two graphs Gi{V,Ei), G 2 {VE 2 ) such that E\ C E 2 , & sandwich problem 
with input pair (Gi,G 2 ) consists in finding a sandwich graph Gg{V, E$), with 
E\ C Eg C E 2 , which has a desired property II [6]. In this paper, the property 
we are interested in is the ownership of a homogeneous set. A homogeneous set 
H, in a graph G(V, E), is a subset of V such that (i) 2 < \H\ < |U| — 1 and (ii) 
for all V G V\H, either {v, v') G E is true for all v' G H or {v, v') ^ E is true for 
all v' G H .In other words, a homogeneous set H is a subset of V such that the 
outside- FJ neighborhood of all its vertices is the same and which also satisfies the 
necessary, above mentioned size constraints. A sandwich homogeneous set of a 
pair (Gi, G 2 ) is a homogeneous set for at least one among all possible sandwich 
graphs for (Gi, G 2 ). 

Graph sandwich problems have attracted much attention lately arising from 
many applications and as a natural generalization of recognition problems [1,3, 
5,6, 7,8, 9]. 

The importance of homogeneous sets in the conetxt of graph decomposition 
has been well acknowledged, specially in the context of perfect graphs [10]. There 
are many algorithms which find homogeneous sets quickly in a single graph. The 
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most efficient one is due to McConnell and Spinrad [11] and has 0(|iH|) time 
complexity. 

On the other hand, the known algorithms for the homogeneous set sandwich 
problem are far less efficient. The first polynomial time algorithm was presented 
by Cerioli et al. [1] and has 0{n'^) time complexity (where n = jCj, as throughout 
the whole text). We refer to it as the Exhaustive Bias Envelopment algorithm 
(EBE algorithm, for short), as in [2]. An 0{An^) algorithm (where A stands for 
the maximum vertex degree in Gi) has been found by Tang et al. [12], but in [4, 
2] it is proved incorrect. Although all efforts to correct Tang et al.’s algorithm 
(referred to as the Bias Graph Components algorithm, in [2]) have been in vain, 
some of its ideas were used, in [4,2], to build a hybrid algorithm, inspired by both 
[1] and [12]. This one has been called the Two-Phase algorithm (2-P algorithm, 
for short) and currently sets the HSSP’s upper bounds at its 0{mim2) time 
complexity, where m\ and mA respectively refer to the number of edges in G\ 
and the number of edges not in G 2 . 

After defining some concepts and auxiliary procedures in Section 2, we 
present, in Section 3, a new 0(n^'^) deterministic algorithm for the HSSP. It 
offers a good alternative to the 3-P algorithm (whose time complexity is not 
better than 0(n‘^) if we express it only as a function of n) when dealing with 
dense input graphs, whereas the 2-P would remain the best choice when sparse 
graphs are dealt with. Besides, Section 4 is devoted to a fast, randomized Monte 
Carlo algorithm, which solves this problem in O(n^) time with whatever desired 
error ratio. 



2 Bias Envelopment 

We define the bias set B{P[) of a vertex subset H as the set of vertices b ^ H such 
that {b,Vi) G El and (b,Vj) ^ E 2 , for some Vi,vj £ PI. Such vertices b are called 
bias vertices of the set H [12]. It is easy to see that H, with 2 < jiJj < n — 1, is 
a sandwich homogeneous set if and only if B{H) = 0. 

It is proved in [1] that any sandwich homogeneous set containing the set of 
vertices H should also contain B{H). This result, along with the fact that no 
homogeneous sets are allowed with less than two vertices, gave birth in that same 
paper to a procedure which we call Bias Envelopment [2] . The Bias Envelopment 
procedure answers whether a given pair of vertices is contained in any sandwich 
homogeneous sets of the input instance. The procedure starts from an initial 
sandwich homogeneous set candidate Hi = {vi,V 2 } and successively computes 
Ht+i = Ht U B{Ht) until either (i) B{Ht) = 0, whereby Ht is a sandwich 
homogeneous set and it answers yes; or (ii) \Ht\ + \B(Ht)\ = n, when it answers 
no, meaning that there is no sandwich homogeneous set containing {rii, ^ 2 }. The 
Bias Envelopment procedure runs in O(n^) time, granted some appropriate data 
structures are used, as described in [1]. 

The EBE algorithm, presented in [1], tries to find a sandwich homogeneous 
set exhaustively, running the Bias Envelopment procedure on all n{n — 1) /2 pairs 
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IncompleteBiasEnvelopment {Gi{V, Ei), G 2 {V, E 2 ), vi,V 2 , k) 



1. ^ {vi,-y2} 

2. while \H\ < k 

2.1. if B(H) = 0 and |_ff| < |R| 

2.1.1. return H and yes //a sandwich homogeneous set was found. 

2.2. else 

2.2.1. H^HVJB{H) 

3. return no // there are no sandwich homogeneous sets with k vertices 

//or less which contain {ui, V 2 }- 



Fig. 1. The Incomplete Bias Envelopment procedure. 



of the input graphs’ vertices, in the worst case. Thus, the time complexity of the 
EBE algorithm is 0{n^). 

Both algorithms we introduce in this paper are based on a variation of the 
Bias Envelopment procedure, which we call the Incomplete Bias Envelopment. 
The input of the Incomplete Bias Envelopment is a pair of vertices {vi, V 2 } and 
a stop parameter k < n. The only change in this incomplete version is that, 
whenever \Ht\ > k, the envelopment stops prematurely, answering no and re- 
jecting {ui, V 2 }. Notice that a no answer from the Incomplete Bias Envelopment 
with parameter k means that {wi,f 2 } is not contained in any homogeneous sets 
of size at most k. Using the same data structures as in [1], the Incomplete Bias 
Envelopment runs in 0{nk) time. 

The Incomplete Bias Envelopment generalizes its complete version, as a nor- 
mal Bias Envelopment is equivalent to an Incomplete Bias Envelopment with 
parameter k = n. 

The pseudo-code for the Incomplete Bias Envelopment is in Figure 1. 

3 The Balanced Subsets Algorithm 

The algorithm we propose in this section is quite similar to the EBE algorithm, 
in the sense that it submits each of the input vertices’ pairs to the process of Bias 
Envelopment. The only difference is that this algorithm establishes a particular 
order in which the vertex pairs are chosen, in such a way that it can benefit, at 
a certain point, from unsuccessful envelopments that have already taken place. 
After some unsuccessful envelopments, a number of vertex pairs have been found 
not to be contained in any sandwich homogeneous sets. This knowledge is then 
made useful by the algorithm, which will stop further envelopments earlier by 
means of calling Incomplete Bias Envelopments instead of complete ones, saving 
relevant time without loss of completeness. 

When the algorithm starts, it partitions all n vertices of the input graphs into 
0{^/n) disjoint subsets Ci of size 0{^/n), each. Then all pairs of vertices will 
be submitted to Bias Envelopment in two distinct phases: in the first phase, all 
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BalancedSubsetsHSSP (Gi(V, Ei), G2{V, E2)) 

1. label all vertices in V from ui to ii„. 

2. create \\/n ] empty sets Ci. 

3. for each vertex Vj£V, do C. modulo 1 = modulo 1 ^ 

4. for each pair of vertices {x, y} in the same subset Ci 

4.1. if BiasEnvelopment(Gi, G 2 , a:, j/) = yes 

4.1.1. return yes. 

5. for each pair of vertices {x, y} not in the same subset Ci 

5.1. if IncompleteBiasEnvelopment(Gi, G 2 , a;, y, 1 ) = 

5.1.1. return yes. 

6. return no. 



Fig. 2. The Balanced Subsets algorithm for the HSSP. 



pairs consisting of vertices from the same subset Ci are bias enveloped (and only 
those); in the second phase, all remaining pairs (i.e. those comprising vertices 
that are not from the same subset Ci) are then bias-enveloped. In the end, all 
possible vertex pairs will have been checked out as to belong or not to some sand- 
wich homogeneous set from the input instance, just like in the EBE algorithm. 
The point is: if all Bias Envelopments in the first phase fail to find a sandwich 
homogeneous set, then the input instance does not admit any sandwich homo- 
geneous sets which contain two vertices from the same subset Q. Thence, the 
maximum size of any possibly existing homogeneous set is 0{^/n) (the number 
of subsets into which the vertices had been dispersed), which grants that all Bias 
Envelopments of the second phase need not search for large homogeneous sets! 
That is why an Incomplete Bias Envelopment with stop parameter k = 0(i/n) 
can be used instead. 

Figure 2 illustrates the pseudo-code for the Balanced Subsets algorithm. 

Theorem 1 The Balanced Subsets algorithm correctly answers whether there 
exists a sandwich homogeneous set in the input graphs. 

Proof. If the algorithm returns yes, then it has successfully found a set H C V, 
with 2< \H\ < |E| — 1, such that the bias set of H is empty. Thus, H is indeed 
a valid sandwich homogeneous set. 

Now, suppose the input has a sandwich homogeneous set H. If \H I > Wn) 
then there are more vertices in H than subsets into which all input vertices 
were spread, in the beginning of the algorithm (line 3). Thus, by the pigeon hole 
principle, there must be two vertices x,y G H which were assigned to the same 
subset Ci. So, whenever {x,y} is submitted to Bias Envelopment (line 4), the 
algorithm is doomed to find a sandwich homogeneous set. On the other hand, 
if \H\ < \^/n ], then it is possible that H does not contain any two vertices 
from the same subset Ci, which would cause all Bias Envelopments of the first 
phase (line 4.1) to fail. In this case, however, when a pair {x,y} C H happens 
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to be bias enveloped in line 5, the Incomplete Bias Envelopment is meant to 
be successful, for the size of H is, by hypothesis, less than or equal its stop 
parameter k = \ y/n ] . □ 



3.1 Complexity Analysis 

As each subset Cj has 0{n) pairs of vertices and there are 0{^Jn) such subsets, 
the number of pairs that are bias enveloped in the first phase of the algorithm 
(line 4) is 0{n^/n). All Bias Envelopments, in this phase, are complete and take 
0{n^) time to be executed, which yields a subtotal of 0(n^'^) time in the whole 
first phase. 

The number of pairs that are only submitted to Bias Envelopment in the sec- 
ond phase (line 5) is O(n^) — 0{riy/n) = 0{n^) pairs. Each Bias Envelopment is, 
now, an incomplete one with parameter k = \\/n ] = 0{\/n). Because the time 
complexity of each Incomplete Bias Envelopment with parameter k is O(nfc), 
then the total time complexity of the whole second phase of the algorithm is 
O(n^fc) = 0(n^'®). 

Thus, the overall time complexity of the Balanced Subsets algorithm is 
0(n^-®). 



4 The Monte Carlo Algorithm 



An yes-biased Monte Carlo algorithm for a decision problem is one which always 
answers no when the correct answer is no and which answers yes with probability 
p whenever the correct answer is yes. 

In order to gather some intuition, let us suppose the input has a sandwich 
homogeneous set H with h vertices or more. 

What would be, in this case, the probability pi that a random pair of vertices 
{x, y} GV is not contained in HI Clearly, 



Pi < 1 



h{h-l) 
n{n — 1) 



What about the probability p* that t random pairs of vertices fail to be 
contained in H? It is easy to see that 



Pt < 



C h{h-l) 

\ n{n-l) 



t 



Now, what is the probability p* that, after t Bias Envelopment procedures 
have been run (starting from t randomly chosen pairs of vertices), a sandwich 
homogeneous set have been found? Again, it is quite simple to reach the following 
expression, which will be vital to the forthcoming reasoning. 




h{h-l) 
n{n — 1) 



Pt > 1 



t 



( 1 ) 
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If, instead of obtaining the probability pt from the expression above, we fix pt 
at some desired value p = 1 — e, we will be able to calculate the minimum integer 
value of ht (which will denote h as a function of t) that satisfies the inequality 1. 
This value ht is such that the execution of t independent Bias Envelopment 
procedures (on t random pairs) is sufficient to find a sandwich homogeneous set 
of the input instance with probability at least p, in case there exists any with ht 
vertices or more (see equation 2): 



^ 1 + - n)(l - (1 - p) V*) ^2) 

However, we want an algorithm that finds a sandwich homogeneous set with 
some fixed probability p in case there exists any, no matter its size. But as ht 
decreases with the growth of t, the following question arises: how many random 
pairs do we need to submit to Bias Envelopment in order to achieve that? The 
answer is rather simple: the minimum integer t' such that ht> = 2, for 2 is the 
shortest possible size of a sandwich homogeneous set! 

Determining t' comes straightforwardly from equation 2 (please refer to Sec- 
tion 4.1 for the detailed calculations): 



ln(l — p) 

In 1 ) 

V n(n-l) ) 






(3) 



Once the number t' of Bias Envelopment procedures that need to be under- 
taken on randomly chosen pairs of vertices is and the time complexity of 

each Bias Envelopment is O(n^), so far we seem to have been lead to an 
randomized algorithm, which is totally undesirable, for we could already solve 
the problem deterministically with less asymptotical effort (see Section 3)1 
Now we have come to a point where the incomplete version of the Bias 
Envelopment procedure will play an essential role as far as time saving goes. We 
show that, by the time the t-th Bias Envelopment is run, its incomplete version 
with stop parameter k = ht-i serves exactly the same purpose as its complete 
version would do. 



Lemma 2 In order to find a sandwich homogeneous set, with probability p, in 
case there exists any with ht vertices or more, the t-th Bias Envelopment need 
not go further when the size of the candidate set has exceeded ht-i- 

Proof. Two are the possibilities regarding the input: (i) there is a sandwich 
homogeneous set with more than ht-i vertices; or (ii) there are no sandwich 
homogeneous sets with more than ht-i vertices. 

If (i) is true, then no more than t — 1 Bias Envelopments would even be 
required to achieve that. Hence the t-th Bias Envelopment can stop as early as 
it pleases. 

If (ii) is the case, then an Incomplete Bias Envelopment with stop parameter 
k = ht-i is meant to give the exact same answer as the complete Bias Envelop- 
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MonteCarloHSSP (Gi(R, Ei), G 2 (V, E 2 ),p) 

1. h-i-\V\ 

2. t-i-1 

3. while h > 2 

3.1. (ui, V 2 ) <— random pair of distinct vertices of V 

3.2. if IncompleteBiasEnvelopment(Gi, G 2 , r>i, W 2 , h) = yes 

3.2.1. return yes 

3.3. h ^ L(1 + v/1 + 4(|VP - |1/|)(1 - (1 - p)i/*))/2j 

3.4. 

4. return no 



Fig. 3. The Monte Carlo algorithm for the HSSP. 



merit would, for there are no sandwich homogeneous sets with more than ht-i 
vertices to be found. 

Whichever the case, thus, such an Incomplete Bias Envelopment is perfectly 
sufficient. □ 

Now we can describe an efficient Monte Carlo algorithm which gives the 
correct answer to the HSSP with probability at least p. 

The algorithm’s idea is to run several Incomplete Bias Envelopment pro- 
cedures on randomly chosen initial candidate sets (pairs of vertices). At each 
iteration t of the algorithm we run an Incomplete Bias Envelopment with stop 
parameter k = ht-i and either it succeeds in finding a sandwich homogeneous 
set (and the algorithm stops with an yes answer) or else it aborts the current 
envelopment whenever the number of vertices in the sandwich homogeneous set 
candidate exceeds the ht-i threshold. (In this case. Lemma 2 grants its appli- 
cability.) For the first iteration, the stop parameter k is set to ho = n, as the 
first iteration corresponds to a complete Bias Envelopment. At the end of each 
iteration, the value of ht is then updated (see equation 2), which makes it pro- 
gressively decrease throughout the iterations until it reaches 2 (the minimum size 
allowed for a homogeneous set), which necessarily happens after 6>(n^) iterations 
(see equation 3). 

The pseudo-code for this algorithm is in Figure 3. 

Theorem 3 The Monte Carlo HSSP algorithm correctly answers whether there 
exists a sandwich homogeneous set in the input graphs with probability at least 
P- 

Proof. If the algorithm returns yes, then it is the consequence of having found a 
set H C V, with 2<|iL|<|E| — 1, such that the bias set of H is empty, which 
makes a valid sandwich homogeneous set out of H. In other words, if the correct 
answer is no then the algorithm gives a correct no answer with probability 1. 

If the correct answer is yes, we want to show that it gives a correct yes answer 
with probability p. Let h* be the size of the largest sandwich homogeneous set 
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of the input instance. As ho = n and the algorithm only answers no after ht 
has lowered down to 2, there must exist an index d such that hd < h* < hd-i- 
From the definition of ht we know that, on the hypothesis that the input has 
a sandwich homogeneous set with ht vertices or more, t Bias Envelopments 
are sufficient to find one, with probability at least p. As, by hypothesis, there 
is a sandwich homogeneous set with h* > hd vertices, then d independent Bias 
Envelopments are sufficient to find a sandwich homogeneous set with probability 
p. So, it is enough to show that this quota of d Bias Envelopments is achieved. 
It is true that Incomplete Bias Envelopments that stop before the candidate 
set has reached the size of h* cannot find a sandwich homogeneous set with h* 
vertices. Nevertheless, the first d iterations alone perform this minimum quota of 
Bias Envelopments. Because h* is the size of the largest sandwich homogeneous 
set, the fact of being incomplete simply does not matter for these first d Bias 
Envelopments, none among which being allowed to stop before the size of the 
candidate set has become larger than hd-i > h*. □ 

4.1 Complexity Analysis 

The first iteration of the algorithm runs the complete Bias Envelopment in 0{n^) 
time [1]. (Actually, a more precise bound is given by 0{mi + m 2 ) [2], but, as the 
complexities of the Incomplete Bias Envelopment procedures do not benefit at 
all from having edge quantities in their analysis, we prefer to write time bounds 
only as functions of n, however.) The remaining iterations take 0{nht) time 
each. To analyze the time complexity of the algorithm, we have to calculate 

t' 

'^0{nht-i) , 
t=i 

where t' is the number of iterations in the worst case. 

The value of ht, obtained at the end of iteration t, is defined by equation 2. 
To calculate t' , we replace hn for 2 and have 

/ 2 

1 ^ = 1 — p, and finally 

\ n[n — 1) / 

_ ln(l -p) 

In fl - 

V n(n-l) ) 

For 0 < a; < 1, it is known that 

t' = ^ = In \e{n^) = 9{n^). 

n{n—l) 0{n^) ^ 



Consequently, 
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Now, we will show that q = h{h— l)/n{n— 1) > /2n^. This result is useful 

to simplify some calculations. We have 



n h — 1 h{h — 1) 

n — 1 h V? n{n — 1) 

h—1 ^ ~ 1) 

h 'n? ~ n{n — 1) 



, and 



Since h> 2, 



h? ^ h{h — 1) 



n(n — 1) 

To calculate the total time complexity, we replace h{h — l)/n(n — 1) for 
h? /2v? and pt for the fixed value p in equation 1, and have 






— < 1 - {l-pf/\ and 
h < 0{n)\Jl — (1 — 



It is well known that 



e^ = l + .+ ^ + ^ + .... 
Consequently, for a; > 1, 

= l + l/0{x). 

Using this approximation, we have 

h < 6»(n)Vl- (l + l/0(t)) = O{n)/0{Vt). 
The total time complexity of the algorithm is 



e(n") 



0(n^ 



E = E ^ = o("’) E l/o(^/i)- 

Using elementary calculus, we have 

e(n") 

^ l/0{Vt) = 0{n). 

t=i 

Consequently, the total time complexity of the algorithm is 0{n^). 



0(n=) 
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5 Conclusion 

In this article, we presented two efficient algorithms for the Homogeneous Set 
Sandwich Problem: the first was an deterministic algorithm and the 

other, an O(n^) Monte Carlo one. The best results so far had been O(n^), if 
only functions of n are used to express time complexities. 

A natural step, after having developed such a Monte Carlo algorithm, is often 
the development of a related Las Vegas algorithm, i.e. an algorithm which always 
gives the right answer in some expected polynomial time. Unfortunately, we do 
not know of any short certificate for the non-existence of sandwich homogeneous 
sets in some given HSSP instance, which surely complicates matters and suggests 
a little more research on this issue. 
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Abstract. In many applications NP-complete problems need to be 
solved exactly. One promising method to treat some intractable prob- 
lems is by considering the so-called Parameterized Complexity that di- 
vides the problem input into a main part and a parameter. The main 
part of the input contributes polynomially on the total complexity of the 
problem, while the parameter is responsible for the combinatorial explo- 
sion. We consider the parallel FPT algorithm of Cheetham et al. to solve 
the fc- Vertex Gover problem, using the GGM model. Our contribution is 
to present a refined and improved implementation. In our parallel exper- 
iments, we obtained better results and obtained smaller cover sizes for 
some input data. The key idea for these results was the choice of good 
data structures and use of the backtracking technique. We used 5 graphs 
that represent conflict graphs of amino acids, the same graphs used also 
by Cheetham et al. in their experiments. For two of these graphs, the 
times we obtained were approximately 115 times better, for one of them 
16 times better, and, for the remaining graphs, the obtained times were 
slightly better. We must also emphasize that we used a computational en- 
vironment that is inferior than that used in the experiments of Cheetham 
et al.. Furthermore, for three graphs, we obtained smaller sizes for the 
cover. 



1 Introduction 

In many applications, we need to solve NP-complete problems exactly. This 
means we need a new approach in addition to solutions such as approximating 
algorithms, randomization or heuristics. 

One promising method to treat some intractable problems is by considering 
the so-called Parameterized Complexity [1] . The input problem is divided into 
two parts: the main part containing the data set and a parameter. For example, 
in the parameterized version of the Vertex Cover problem for a graph G = {V, E), 
also known as the fc- Vertex Cover, we want to determine if there is a subset in V 
of size smaller than k, whose edges are incident with the vertices of this subset. 
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In this problem, the input is a graph G (the main part) and a non-negative 
integer k (the parameter) . For simplicity, a problem whose input can be divided 
like this is said to be parameterized. 

A parameterized problem is said to be fixed-parameter tractable, or FPT 
for short, if there is an algorithm that solves the problem in 0{f{k)rG) time, 
where a is a constant and / is an arbitrary function [1]. If we exchange the 
multiplicative connective between these two contributions by an additive con- 
nective (/(fc) -l- n“), the definition of FPT problems remains unchanged. The 
main part of the input contributes polynomially on the total complexity of the 
problem, while the parameter is responsible for the combinatorial explosion. This 
approach is feasible if the constant a is small and the parameter k is within a 
tight interval. The A:- Vertex Cover problem is one of the first problems proved to 
be FPT and is the focus of this work. One of the well-known FPT algorithms for 
this problem is the algorithm of Balasubramanian et al. [2] , of time complexity 
0{kn 1.324718^fc^), where n is the size of the graph and k is the maximum 
size of the cover. This problem is very important from the practical point of 
view. For example, in Bioinformatics we can use it in the analysis of multiple 
sequences alignment. 

Two techniques are usually applied in the FPT algorithms design: the reduc- 
tion to problem kernel and the bounded search tree. These techniques can be 
combined to solve the problem. 

FPT algorithms have been implemented and they constitute a promising ap- 
proach to solve problems to get the exact solution. Nevertheless, the exponential 
complexity on the parameter can still result in a prohibitive cost. In this article, 
we show how we can solve larger instances of the /c- Vertex Cover, using the CGM 
parallel model. 

A CGM (Coarse-Grained Multicomputer) [3] consists of p processors con- 
nected by some interconnection network. Each processor has local memory of 
size 0{N/p), where N is the problem size. A CGM algorithm alternates be- 
tween computation and communication rounds. In a communication round each 
processor can send and receive a total of 0{N/p) data. 

The CGM algorithm presented in this paper has been designed by Cheetham 
et al. [4] and requires 0(logp) communication rounds. It has two phases: in the 
first phase a reduction to problem kernel is applied; the second phase consists 
of building a bounded search tree that is distributed among the processors. 

Cheetham et al. implemented the algorithm and the results are presented 
in [4]. Our contribution is to present a refined and improved implementation. In 
our parallel experiments, we obtained better results and obtained better cover 
sizes for some input data. The key idea for these results was the choice of good 
data structures and use of the backtracking technique. We used 5 graphs that 
represent conflict graphs of amino acids, and these same graphs were used also 
by Cheetham et al. [4] in their experiments. For two of these graphs, the times we 
obtained were approximately 115 times better, for one of them 16 times better, 
and, for the remaining graphs, the obtained times were slightly better. We must 
also emphasize that we used a computational environment that is inferior than 
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that used in the experiments of Cheetham et al. [4]. Furthermore, for three 
graphs, we obtained smaller sizes for the cover. 

In the next section we introduce some important concepts. In Section 3 we 
present the main FPT sequential algorithms for the problem and the CGM ver- 
sion. In Section 4 we present the data structures and discuss the implementation 
and in Section 5 we show the experimental results. In Section 6 we present some 
conclusions. 

2 Parameterized Complexity and fc- Vertex Cover 
Problem 

We present some fundamental concepts for sequential and CGM versions of the 
FPT algorithm for the fc- Vertex Cover problem. 

Parameterized complexity [1,5, 6, 7,8] is another way of dealing with the in- 
tractability of some problems. This method has been successfully used to solve 
problems of instance sizes that otherwise cannot be solved by other means [7] . 

The input of the problem is divided into two parts: the main part and the 
parameter. There exist some computational problems that can be naturally spec- 
ified in this manner [5]. 

In classical computational complexity, the entire input of the problems is 
considered to be responsible for the combinatorial explosion of the intractable 
problem. In parameterized complexity, we try to understand how the different 
parts of the input contribute in the total complexity of the problem, and we 
wish to identify those input parts that cause the apparently inevitable combi- 
natorial explosion. The main input part contributes in a polynomial way in the 
total complexity of the problem, while the parameter part probably contributes 
exponentially in the total complexity. Thus, in cases where we manage to do 
this, NP-complete problems can be solved by algorithms of exponential time 
with respect to the parameter and polynomial time with respect to the main 
input part. Even then we need to confine the parameter to a small, but useful, 
interval. In many applications, the parameter can be considered “very small” 
when compared to the main input part. 

A parameterizable problem is a set L C A* x A*, where A is a fixed alphabet. 
If the pair (cc, y) € L, we call x the main input part (or instance) and y the 
parameter. 

According to Downey and Fellows [1], a parameterizable problem A C A* x 
IM* is fixed parameter tractable if there exists an algorithm that, given an input 
(x,y) G L, solves it in 0{f(k)n°‘) time, where n is the size of the main input 
part X, |x| = n, k is the size of parameter y, |t/| = /c, a is a constant independent 
of k, and / is an arbitrary function. 

The arbitrary function /(fc) of the definition is the contribution of the pa- 
rameter y to the total complexity of the problem. Probably this contribution 
is exponential. However, the main input part contributes polynomially to the 
total complexity of the problem. The basic assumption is that A: <C n [8]. The 
polynomial contribution is acceptable if the constant a is small. However, the 
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definition of fixed parameter tractable problem remains unchanged if we ex- 
change the multiplicative connective between the two contributions, /(/c)n“, by 
an additive connective f{k) + n°‘ [1]. 

The fixed parameter tractable problems form a class of problems called FPT 
{Fixed-Parameter Tractability) . There are NP-complete problems that has been 
proven not to be in FPT class. 

An important issue to compare the performance of FPT algorithms is the 
maximum size for the parameter k, without affecting the desired efficiency of 
the algorithm. This value is called klam and is defined as the largest value 
of k such that f{k) < U, where U is some absolute limit on the number of 
computational steps. Downey and Fellows [1] suggest U = 10^°. A challenge in 
the fixed parameter tractable problems is the design of FPT algorithms with 
increasingly larger values of klam. 

Two elementary methods are used to design algorithms for fixed parameter 
tractable problems: reduction to problem kernel and bounded search tree. The 
application of these methods, in this order, as an algorithm of two phases, is 
the basis of several FPT algorithms. In spite of being simple algorithmic strate- 
gies, these techniques do not come into mind immediately, since they involve 
exponential costs relative to the parameter [6] . 

— Reduction to problem kernel: The goal is to reduce, in polynomial time, 
an instance I of the parameterizable problem into another equivalent in- 
stance whose size is limited by a function of the parameter k. If a solu- 
tion of I' is found, probably after an exhaustive analysis of the generated 
instance, this solution can be transformed into a solution of I. The use of this 
technique always results in an additive connective between the contributions 

and f{k) on the total complexity. 

— Bounded search tree: This technique attempts to solve the problem 
through an exhaustive tree search, whose size is to be bounded by a function 
of the parameter k. Therefore, we use the instance generated by the reduc- 
tion to problem kernel method in the search tree, which must be traversed 
until we find a node with the solution of the instance. In the worst case, 
we have to traverse all the tree. However, it is important to emphasize that 
the tree size depends only on the parameter, limiting the search space by a 
function of k. 

In the parameterized version of the Vertex Cover problem, also known as 
/c- Vertex Cover problem, we must have a graph G = (V, E) (the instance) and a 
non-negative integer k (the parameter). We want to answer the following ques- 
tion: “Is there a set V' C V of vertices, whose maximum size is k, so that for 
every edge (u,v) £ E, u G V' or v G V'?”. Many other graph problems can be 
parameterized similarly. 

The set V' is not unique. An application of the vertex cover problem is the 
analysis of multiple sequences alignment [4]. A solution to resolve the conflicts 
among sequences is to exclude some of them from the sample. A conflict exists 
when two sequences have a score below a certain threshold. We can construct 
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a graph, called the conflict graph, where each sequence is a vertex and an edge 
links two conflict sequences. Our goal is to remove the least number of sequences 
so that the conflict will be deleted. We thus want to And a minimum vertex cover 
for the conflict graph. 

A trivial exact algorithm for this problem is to use brute force. In this case 
all the possible subsets whose size is smaller or equal to k are verified to be a 
cover [1], where k is the maximum size desired for the cover and n is the number 
of vertices in the graph (k <n). The number of subsets with k elements is Cn,k, 
so the algorithm to And all these subsets has time complexity of O(n^). The 
costly brute force approach is usually not feasible in practice. 

3 FPT Algorithms for the fc-Vertex Cover Problem 

In this section we present FPT algorithms that solve the vertex cover problem 
and are used in our implementation. Initially we show the algorithm of Buss [9], 
responsible for the phase of reduction to problem kernel. Then we show two 
algorithms of Balasubramanian et al. [2] that present two forms to construct 
the bounded search tree. Finally we present the CGM algorithm of Cheetham 
et al. [4] . In all these algorithms, the input is formed by a graph G and the size 
of the vertex cover desired (parameter k) . 



3.1 Algorithm of Buss 

The algorithm of Buss [9] is based on the idea that all the vertices of degree 
greater than k belong to any vertex cover for graph G of size smaller or equal 
to k. Therefore, such vertices must be added to the partial cover and removed 
from the graph. If there are more than k vertices in this situation, there is no 
vertex cover of size smaller or equal to k for the graph G. 

The edges incident with the vertices of degree greater than k can also be 
removed since they are joined to at least one vertex of the cover, and the isolated 
vertices are removed once there are no vertices to cover. The graph produced is 
denominated G' . 

From now on, our goal is to And a vertex cover of size smaller or equal to k' 
for the graph G' , where k' is the difference between k and the number of elements 
of the partial vertex cover. This is only possible if there do no exist more than 
kk' edges in G' . This is because k' vertices can cover at most kk' edges in the 
graph, since the vertices of G' have degree bounded by k. Furthermore, if we do 
not have more than kk' edges in G' , nor isolated vertices, we can conclude that 
there are at most 2kk' vertices in G' . As k' is at most k, the size of the graph 
G' is 0(P). 

Given the adjacency list of the graph, the steps described until here spend 
0{kn) time and form the basis for the reduction to problem kernel phase. Observe 
that graph G is reduced, in polynomial time, to an equivalent graph G' , whose 
size is bounded by a function of the parameter k. The kernellization phase as 
described is used in the algorithms presented in the next subsection. 
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To determine finally if there exists or not a vertex cover for G' of size smaller 
or equal to k' , the algorithm of Buss [9] executes a brute force algorithm. If a 
vertex cover for G' of size smaller or equal to k' exists, these vertices and the 
vertices of degree greater than k form a vertex cover for G of size smaller or 
equal to k. The algorithm of Buss [9] spends a total time of 0{kn + 

3.2 Algorithms of Balasubramanian et al. 

The algorithms of Balasubramanian et al. [2] execute initially the phase of reduc- 
tion to problem kernel based on the algorithm of Buss [9] . In the second phase, 
a bounded search tree is generated. The two options to generate the bounded 
search tree are shown in Balasubramanian et al. [2] and described below as Algo- 
rithm B1 and Algorithm B2. In both cases, we search the tree nodes exhaustively 
for a solution of the vertex cover problem, by depth first tree traversal. The dif- 
ference between the two algorithms is the form we choose the vertices to be 
added to the partial cover and, consequently, the format of such a tree. 

Each node of the search tree stores a partial vertex cover and a reduced 
instance of the graph. This partial cover is composed of the vertices that belong 
to the cover. The reduced instance is formed by the graph resulting from the 
removal of the vertices of G that are in the partial cover, as well as the edges 
incident with them and any isolated vertex. We call this graph G” and an integer 
k” that is the maximum desired size for the vertex cover of G". The root of the 
search tree, for example, represents the situation after the method of reduction 
to problem kernel. In other words, in the partial cover we have the vertices of 
degree greater than k and the instance (G', k'). 

The edges of the search tree represents the several possibilities of adding 
vertices to the existing partial cover. Notice that the son of a tree node has more 
elements in the partial vertex cover and a graph with less nodes and edges than 
its parent, since every time a vertex is added to the partial cover, we remove it 
from the graph, together with the incident edges and any isolated vertices. We 
actually do not generate all the nodes before the depth first tree traversal. We 
only generate a node of the bounded search tree when this node is visited. 

The search tree has the following property: for each existing vertex cover for 
graph G of size smaller or equal to k, there exists a corresponding tree node with 
a resulting empty graph and a vertex cover (not necessarily the same) of size 
smaller or equal to k. However, if there is no vertex cover of size smaller or equal 
to k for graph G, then no tree node possesses a resulting empty graph. Actually 
the growth of the search tree is interrupted when the node has a partial vertex 
cover of size smaller or equal to A: or a resulting empty graph (case in which 
we find a valid vertex cover for graph G). Notice that this bounds the size of 
the tree in terms of the parameter k. Therefore, in the worst case, we have to 
traverse all the search tree to determine if there exists or not a vertex cover of 
size smaller or equal to k for graph G. 

Given the adjacency list of the graph, we spend 0(m) time in each node, 
where m is the number of vertices of the current graph. Therefore, if G{k) is the 
number of nodes of the search tree, then the time spent to traverse all the tree is 
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0{mC{k)). Recall that the root node of the search tree, whose size is bounded 
by 0{k‘^), stores the resulting graph of the phase of reduction to problem kernel. 



Algorithm Bl. In this algorithm, the choice of the vertices of G” to be added 
to the partial cover in any tree node is done according to a path generated from 
any vertex v of G" that passes through at least three edges. 

If this path has size one or two, then we add the neighbor of the node of 
degree one to the partial cover, remove their incident edges and any isolated 
vertices. This new graph instance with the new partial cover is kept in the same 
node of the bounded search tree and the Algorithm Bl is applied again in this 
node. 

If this path is a simple path of size three, passing by vertices v, vi, V 2 and 
V 3 , any vertex cover must contain {v,V 2 } or {vi,V 2 } or {^ 1 ,^ 3 }. If the path is a 
simple cycle of size three, passing by vertices v, Vi, V2 and v, any vertex cover 
must contain {u, ui} or {vi,V 2 } or {u, V 2 }- In both cases, the tree node is ramified 
into three three sons to add one of the three pairs of suggested vertices. We can 
then go to the next node of the tree, recalling the depth first traversal. 

Notice that this algorithm generates a tertiary search tree and that at each 
tree level the partial cover increases by at least two vertices. The Algorithm Bl 
spends 0{kn+ {\/i)^k^) time to solve the fc- Vertex Cover problem. 



Algorithm B2. In this algorithm, the choice of vertices of G” to be added 
to the partial cover in any node of the tree is done according to five cases by 
considering the degree of the vertices of the resulting graph. We deal first with 
the vertices of degree 1 (Case 1), then with vertices of degree 2 (Case 2), then 
with vertices of degree 5 or more (Case 3), then with vertices of degree 3 (Case 
4) and, finally, with vertices of degree 4 (Case 5). 

We use the following notation. N{y) represents the set of vertices that are 
neighbors of vertices v and N{S) represents the set 

In Case 1, if there exists a vertex v of degree 1 in the graph, then we create 
a new son to add N{v) to the partial cover. 

In Case 2, if there exists a vertex v of degree 2 in the graph, then we can have 
three subcases, to be tested in the following order. Let x and y be the neighbors 
of V. In Subcase 1, if there exists an edge between x and y, then we create a new 
son to add N{v) to the partial cover. In Subcase 2, if x and y have at least two 
neighbors different from v, then we ramify the node of the tree into two sons to 
add N{v) and N{{x,y}) to the partial cover. In Subcase 3, if x and y share an 
only neighbor a different from v, then we create a new son to add {u, a}. 

In Case 3, if there exists a vertex of degree 5 or more in the graph, then we 
ramify the node of the tree into two sons to add v and N(v) to the partial cover. 

If none of the three previous cases occurs, then we have a 3 or 4-regular 
graph. In case 4, if there exists a vertex v of degree 3, then we can have four 
subcases, to be treated in the following order. Let x, y and z be the neighbors 
of u. In Subcase 1, if there exists an edge between two neighbors of v, say x and 
y, the we ramify the node of the tree into two sons to add N{v) and N{z) to 
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the partial cover. In Subcase 2, if a pair of neighbors of say x and y, share 
another common neighbor a (but different from v), then we ramify the node of 
the tree into two sons to add N{v) and , a} to the partial cover. In Subcase 3, 
if a neighbor of v, say x, has at least three neighbors different from v, then we 
ramify the node of the tree into three sons to add N{v), N{x) and x[jN{{y, z}) 
to the partial cover. In Subcase 4, the neighbors of v have exactly two private 
neighbors, not considering vertex v proper. Let x be a neighbor of v and let a 
and b be the neighbors of x, then we ramify the node of the tree into three sons 
to add N{v), {v,a,b} and N{{y,z,a,b}) to the partial cover. 

In Case 5, we have a 4-regular graph and we can have three subcases, to be 
tested in the following order. Let w be a vertex of the graph and x, y, z and w its 
neighbor vertices. In Subcase I, if there exists an edge between two neighbors of 
V, say X and y, then we ramify the node of the tree into three sons to add N(v), 
N{z) and z[JN{w) to the partial cover. In Subcase 2, if three neighbors of v, 
say X, y and z, share common neighbor a, then we ramify the node of the tree 
into two sons to add N{v) and {v, a) to the partial cover. In Subcase 3, if each of 
the neighbors of v has three neighbors different from v, then we ramify the node 
of the tree into four sons to add N{v), N{y), y\jN{w) and {y , w} [j N {{x , z}) 
to the partial cover. 

Contrary to Algorithm Bl, a node in the search tree can be ramified into two, 
three or four sons, and the partial cover can increase up to 8 vertices, depending 
on the selected case. Algorithm B2 spends 0{kn + 1.324718*fc^) time to solve 
the fc- Vertex Cover problem. 

3.3 Algorithm of Cheetham et al. 

The CGM algorithm proposed by Cheetham et al. [4] to solve the A:- Vertex Cover 
problem parallelizes both phases of an FPT algorithm, reduction to problem 
kernel and bounded search tree. Previous works designed for the PRAM model 
parallelize only the method of reduction to problem kernel [4]. However, as the 
implementations of FPT algorithms usually spends minutes in the reduction to 
problem kernel and hours, or maybe even days in the bounded search tree, the 
parallelization of the bounded search tree designed in the CGM algorithm is an 
important contribution. 

The CGM algorithm of Cheetham et al. [4] solves even larger instances of 
the /c- Vertex Cover problem than those solved by sequential FPT algorithms. 
The implementation of this algorithm can solve instances with k > 400 in less 
than 75 minutes of processing time. It is important to emphasize that the k- 
Vertex Cover is considered well solved for instances of A: < 200 (sequential FPT 
algorithms) [7]. Not only there is a considerable increase in the parameter k, it 
is important to recall that the time of a FPT algorithm grows exponentially in 
relation to k. 

The phase of reduction to problem kernel is parallelized through a parallel 
integer sorting. The p processors that participate in the parallel sort are identified 
as Pi, 0 < A < p — 1. To identify vertices of the graph with degree larger than 
k, the edges are sorted by the label of the vertex they are incident with through 
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deterministic sample sort [ 10 ], that require 0 ( 1 ) parallel integer sorts, i.e. in 
constant time. The partial vertex cover (vertices with degree larger than k) and 
the instance (O', k') is sent to all the processors. 

The basic idea of the parallelization of the phase of bounded search tree 
is to generate a complete tertiary tree T with 0 (log 3 p) tree levels and p leaf 
nodes ( 70 . .. 7 ^- 1 ). Each one of these p leaf nodes is then assigned to one of the 
p processors, that search locally for a solution in the subtree generated from the 
leaf node 7 ^, as shown in Fig. 1. A detailed description of this phase is presented 
in the following. 



<G'.k'> 




Fig. 1. A processor Pi computes the unique path in T from the root to leaf 7 i, using 
the Algorithm Bl. Then, Pi computes the entire subtree below 7 i, using the Algorithm 



B2. 



— Consider the tertiary search tree T. Each processor Pi, 0 < i < p, starts this 
phase with the instance obtained at the previous phase {{G',k')), and uses 
Algorithm Bl to compute the unique path in T from the tree root to the 
leaf node 7 *. Let (G", k'/), be the instance computed at the leaf node 7 ^. 

— Each processor Pi, 0 < i < p, searches locally for a solution in the subtree 
generated from (G", fc"), based on Algorithm B2. Processor Pi chooses a son 
of the node at random and expands it until a solution is found or the partial 
cover is larger than k. If a solution is not found, return to the subtree to 
get a still not explored son, until all the subtree is traversed. If a solution is 
encountered, the other processors are notified to interrupt. 

In the algorithm of Cheetham et al. [4], the major part of the computational 
work occurs when each processor Pi, 0 < i < p, computes locally the search tree 
from {G”,k”), where Algorithm B2 is used. As all the p subtrees are traversed 
simultaneously, it is possible that the parallel algorithm visits nodes that the 
sequential algorithm would not visit. 
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4 Implementation Details 

In this section we present some implementation details of the parallel FPT al- 
gorithm and discuss the data structures utilized in our implementation. We use 
C/C-l— I- and the MPI communication library. 

The program receives as input a text file describing a graph G by its adjacency 
list and an integer k that determines the maximum size for the vertex cover 
desired. Let n be the number of vertices and m the number of edges of graph G 
and p the number of processors to run the program. 

At the beginning of the reduction to problem kernel phase, the input ad- 
jacency list of graph G is transformed into a list of corresponding edges and 
distributed among the p processors. Each processor Pi, 0 < i < p, receives m/p 
edges and is responsible for controlling the degrees of n/p vertices. 

Each processor sorts the edges received by the identifier of the first vertex 
they are incident with, and obtains the degree of such vertices. Notice it is possi- 
ble for a processor to compute the degree of the vertices that are of responsibility 
for another processor. In this case, the results are sent to the corresponding pro- 
cessor. 

After this communication, the p processors can identify the local vertices 
with degree larger than k and send this information to the others, so that each 
processor can remove the local edges incident with these vertices. All the re- 
maining edges after the removal, that form the new graph G' , are sent to all the 
processors. In this way, at the end of this phase, each processor has the instance 
generated by the method of reduction to problem kernel and the partial cover 
(vertices of degree smaller than k), that is, the root of the bounded search tree. 
The p processors transform the list of edges corresponding to graph G" again 
into an adjacency list, that will be used in the next phase. 

The resulting adjacency list from the reduction to problem kernel is imple- 
mented as a doubly linked list of vertices. Each node x of this list of vertices 
contains a pointer to a doubly linked list of pointers, whose elements represent 
all the edges incident with x, that we denote, for simplicity, by the list of edges 
of X. Each node of the list of edges of x points to the node of the list of vertices 
that contains the other extreme of the edge. In spite of the fact that graph is not 
a directed graph, each edge is represented twice in distinct lists of edges. Thus 
each node of the list of edges contains also a pointer to its other representation. 
In Fig. 2 we present an example of a graph and the data structure to store it. 

The insertion of a new element in the list of vertices takes 0{n) time, since 
it is necessary to check if such elements already exist. In the list of edges, the 
insertion of a new element, in case it does not yet exist, results in the insertion 
of elements in the two lists of edges incident with its two extremes and also takes 
0(n) time. 

The removal of a vertex or an edge is a rearrangement of the pointers of 
previous and next elements of the list. They are not effectively deallocated from 
memory, they are only removed from the list. Notice that the edges incident with 
it are removed automatically with the vertex. However, we still have to remove 
the other representation. As each edge has a pointer to its other representation. 
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Fig. 2. The data structure used to store the graph G. 



we spend 0 ( 1 ) time to remove it from the list of edges of the other vertex. 
Therefore, we spend 0(fc) time to remove a vertex from the list, since the vertices 
of the graph have degree bounded by k. In our implementation, we store in 
memory only the data relative to the node of the bounded search tree being 
worked on. 

Since we use depth first traversal in the bounded search tree, we need to 
store some information that enables us to go up the tree and recover a previous 
instance of the graph. Thus our program uses the backtracking technique. Such 
information is stored in a stack of pointers to removed vertices and edges. Adding 
an element in the stack takes 0(1) time. Removing an element from the stack 
and put it back in the graph also takes 0 ( 1 ) time, since the removed vertex or 
edge has pointers to the previous and next elements in the list. 

The partial vertex cover is also a stack of pointers to vertices known to be 
part of the cover. To add or remove an element from the cover takes 0(1) time. 

At the beginning of the bounded search tree phase, all the p processors con- 
tain the instance ((O', k'}) and the partial vertex cover resulting from the phase 
of reduction to problem kernel. As seen in Section 3.3, there exists a bounded 
tertiary complete search tree T with p leaf nodes. Each processor Pi, 0 < i < p, 
uses Algorithm Bl, generates the unique path in tree T from the root to the leaf 
node 7 i of tree T. Then, each processor Pi applies Algorithm B2 in the subtree 
whose root is the leaf node 7 ^, until finding a solution or finishing the traversal. 

In Algorithm Bl, we search a path that starts at a vertex and passes through 
at most three edges. In our implementation, this initial vertex is always the first 
vertex of the list and, therefore, the same tree T is generated in all the executions 
of the program. 

In the implementation of Algorithm B2, to obtain constant time for the 
selection of a vertex for the cases of this algorithm, we use 6 auxiliary lists of 
pointers to organize the vertices of the graph according to its degree (0, 1, 2, 3, 
4 and 5 or more). Furthermore, each vertex of the graph also has a pointer to 
its representative in the list of degrees, therefore in any change of degree of a 
vertex implies 0 ( 1 ) time to change it in the list of degrees. 
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5 Experimental Results 

We present the experimental results by implementing the CGM algorithm of 
Cheetham et al. [4] , using the data structures and the description of the previous 
section. Our parallel implementation will be called Par-Impl. Furthermore, we 
also implemented Algorithm B2 in C/C++, to be called Seq-Impl. 

The computational environment is a Beowulf cluster of 64 Pentium III pro- 
cessors, with 500 MHz and 256 MB RAM memory each processor. All the nodes 
are interconnected by a Gigabit Ethernet switch. We used Linux Red Hat 7.3 
with g++ 2.96 and MPI/LAM 6.5.6. 

The sequential times were measured as wall clock times in seconds, including 
reading input data, data structures deallocation and writing output data. The 
parallel times were also measured as wall clock time between the start of the 
first processor and termination of the last process, including I/O operations and 
data structures deallocation. 

In our experiments we used conflict graphs that were kindly provided by 
Professor Frank Dehne (Carleton University). These graphs represent sequences 
of amino acid collected from the NCBI database. They are Somatostatin, WW, 
Kinase, SH2 (src-homology domain 2) and PHD (pleckstrin homology domain) . 
The Table 1 shows a summary of the characteristics of these graphs (name, 
number of vertices, number of edges, size of desired cover and size of the cover 
to search for after the reduction to problem kernel) . 



Table 1. Sequences and corresponding graphs and cover sizes used in experiments. 



Graph 


IV^I 


\E\ 


k 


k’ 


Kinase 


647 


113122 


495 


391 


PHD 


670 


147054 


601 


600 


SH2 


730 


95463 


461 


397 


Somatostatin 


559 


33652 


272 


254 


WW 


425 


40182 


322 


318 



In Fig. 3 we compare the times obtained by executing Seq-Impl and Par- 
Impl in a single processor (3 virtual processors) and Par-Impl in 27 processors. 
To run Par-Impl in a single processor we used MPI/LAM simulation mode, that 
simulates p virtual processors as independent processes on the same physical 
processor. The time obtained by Par-Impl in a single processor is the sum of the 
wall clock times of the individual processes plus the overhead created by their 
communication. The tests were carried out for the graphs PHD, Somatostatin 
and WW. These input data were chosen because their sequential times are rea- 
sonable. To obtain the averages, we ran Seq-Impl 10 times for each data set and 
Par-Impl 30 times for each data set. In spite of the fact we are using a single 
processor to run the parallel implementation, the time was significantly much 
smaller. This is justified by the fact of having more initial distinct points in the 
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16384 - 



average time on one processor (Impl-s) 
average time on one processor (Impl-p) - 

a^age time on 27 processors (Imp-p) 




Somatostatin 



Fig. 3. Comparison of sequential and parallel times. 



bounded search tree, such that from one of them we can find a path that takes 
to the cover more quickly. 

In Fig. 4 we show the average of the parallel times obtained in 27 processors. 
Our parallel implementation can solve problem instances of size k > 400 in 
less than 3 minutes. For example, graph PHD (fc = 601) can be solved in less 
than 1 minute. Notice that fc- Vertex Cover problem is considered well solved 
for instances of fc < 200 by sequential FPT algorithms [7]. It is important to 
emphasize that the time of FPT algorithm grows exponentially in relation to 
k. Again we use 30 time samples to get the average time. Observe the times 
obtained and the Table 1. We see that the parallel wall clock times do not 
strictly increase with either k or k' . This makes us conclude that the graph 
structure is the responsible for the running time. 




Kinase PHD SH2 Somatostatin WW 



Fig. 4. Average wall clock times for the data sets on 27 real processors. 
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The parallel times, using 3, 9 and 27 processors for the graphs PHD, So- 
matostatin and WW are shown in Fig. 5. Notice the increase in the number 
of processors does not necessarily imply a greater improvement on the aver- 
age time, in spite of the always observed time reduction. Nevertheless, the use 
of more processors increases the chance of determining the cover more quickly, 
since we start the tree search in more points. Furthermore, it seems that the 
number of tree nodes with a solution also has some influence on the running 
times. As we do the depth first traversal in the bounded search tree, a wrong 
choice of a son to visit means that we have to traverse all the subtree of the son 
before choosing another son to visit. 




Number of processors 

Fig. 5. Average wall clock times on 3, 9 and 27 processors for PHD, Somatostatin and 
WW. 



For the graphs PHD, SH2, Somatostatin and WW we could guarantee, in 
less than 75 minutes, the non existence of covers smaller than that determined 
by the parallel algorithm, confirming the minimality of the values obtained. For 
this, all the possible nodes of the bounded search tree were generated. For the 
graph Kinase this was not possible in an acceptable time. 

Our results were compared with those presented in Cheetham et al. [4], who 
used a Beowulf Cluster of 32 Xeon nodes of 1.8 GHz and 512 MB of RAM. All 
the nodes were interconnected by a Gigabit Ethernet switch. Every node was 
running Linux Red Hat 7.2 with gcc 2.95.3 and MPl/LAM 6.5.6. 

Our experiments are very relevant, since we used a computational platform 
that is much inferior than that used in Cheetham et al. [4]. The parallel times 
obtained in our experiments were better. We considered that the choice of good 
data structures and use of the backtracking technique were essential to obtain 
our relevant results. For the graphs Kinase and SH2 we obtained parallel times 
that are much better, a reduction by a factor of approximately 115. The time 
for the graph PHD was around 16 times better. For the graphs Somatostatin 
and WW the times are slightly better. As we did not have access to the im- 
plementation of Cheetham et al. [4], we tested several data structures in our 
implementation. In the final version we used that implementation that gave the 
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best performance, together with the backtracking technique. More details can 
be found in Hanashiro [11]. 

Furthermore, the size of the covers obtained were smaller for the following 
graphs: Kinase (from 497 to 495), PHD (from 603 to 601) and Somatostatin 
(from 273 to 272). It is important to emphasize that the reduction in the size 
of the cover implies the reduction on the universe of existing solutions in the 
bounded search tree, which in turn gives rise to an increase in the running time. 



6 Conclusion 

FPT algorithms constitute an alternative approach to solve NP-complete prob- 
lems for which it is possible to fix a parameter that is responsible for the combi- 
natorial explosion. The use of parallelism improve significantly the running time 
of the FPT algorithms, as in the case of the fc- Vertex Cover problem. 

In the implementation of the presented CGM algorithm, the choice of the 
data structures and the use of the backtracking technique were essential to ob- 
tain the relevant experimental results. During the program design, we utilized 
several alternative data structures and their results were compared with those 
of Cheetham et al. [4]. Then we chose the design that obtained the best perfor- 
mance. Unfortunately we did not have access to the implementation of Cheetham 
et al. to compare it with our code. 

We obtained great improvements on the running times as compared to those 
of Cheetham et al. [4] . This is more significant if we take into account the fact 
that we used an inferior computational environment. Furthermore, we improved 
the values for the minimum cover and guaranteed the minimality for some of 
the graphs. 

The speedups of our implementation with that of Cheetham et al. [4] vary 
very much. The probable cause of this may lie in the structures of the input 
graphs, and also in the number of solutions and how these solutions are dis- 
tributed among the nodes of the bounded search tree. 

For the input used, only for the Thrombin graph we did not obtain better 
average times, as compared to those of Cheetham et al. [4]. To improve the 
results, we experimented two other implementations, by introducing randomness 
in some of the choices. With these modifications, in more experiments we get 
lower times for the Thrombin graph, though we did not improve the average. 
For some of the graphs, the modification increases the times obtained, and does 
not justify its usage. 
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Abstract. Computing a shortest path from one node to another in a 
directed graph is a very common task in practice. This problem is classi- 
cally solved by Dijkstra’s algorithm. Many techniques are known to speed 
up this algorithm heuristically, while optimality of the solution can still 
be guaranteed. In most studies, such techniques are considered individ- 
ually. The focus of our work is the combination of speed-up techniques 
for Dijkstra’s algorithm. We consider all possible combinations of four 
known techniques, namely goal-directed search, bi-directed search, multi- 
level approach, and shortest-path hounding boxes, and show how these 
can be implemented. In an extensive experimental study we compare 
the performance of different combinations and analyze how the tech- 
niques harmonize when applied jointly. Several real-world graphs from 
road maps and public transport and two types of generated random 
graphs are taken into account. 



1 Introduction 

We consider the problem of (repetitively) finding single-source single-target 
shortest paths in large, sparse graphs. Typical applications of this problem in- 
clude route planning systems for cars, bikes, and hikers [1,2] or scheduled ve- 
hicles like trains and buses [3,4], spatial databases [5], and web searching [6]. 
Besides the classical algorithm by Dijkstra [7], with a worst-case running time of 
O(m-i-nlogn) using Fibonacci heaps [8], there are many recent algorithms that 
solve variants and special cases of the shortest-path problem with better running 
time (worst-case or average-case; see [9] for an experimental comparison, [10] for 
a survey and some more recent work [11,12,13]). 

It is common practice to improve the running time of Dijkstra’s algorithm 
heuristically while correctness of the solution is still provable, i.e., it is guaranteed 
that a shortest path is returned but not that the modified algorithm is faster. 
In particular, we consider the following four speed-up techniques: 

* This work was partially supported by the Human Potential Programme of the Euro- 
pean Union under contract no. HPRN-CT-1999-00104 (AMORE) and by the DFG 
under grant WA 654/12-1. 
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Goal-Directed Search modifies the given edge weights to favor edges leading 
towards the target node [14,15]. With graphs from timetable information, a 
speed-up in running time of a factor of roughly 1.5 is reported in [16]. 
Bi-Directed Search starts a second search backwards, from the target to the 
source (see [17], Section 4.5). Both searches stop when their search horizons 
meet. Experiments in [18] showed that the search space can be reduced by a 
factor of 2, and in [19] it was shown that combinations with the goal-directed 
search can be beneficial. 

Multi-Level Approach takes advantage of hierarchical coarsenings of the giv- 
en graph, where additional edges have to be computed. They can be regarded 
as distributed to multiple levels. Depending on the given query, only a small 
fraction of these edges has to be considered to find a shortest path. Using 
this technique, speed-up factors of more than 3.5 have been observed for road 
map and public transport graphs [20]. Timetable information queries could 
be improved by a factor of 11 (see [21]), and also in [22] good improvements 
for road maps are reported. 

Shortest- Path Bounding Boxes provide a necessary condition for each edge, 
if it has to be respected in the search. More precisely, the bounding box of 
all nodes that can be reached on a shortest path using this edge is given. 
Speed-up factors in the range between 10 and 20 can be achieved [23]. 

Goal-directed search and shortest-path bounding boxes are only applicable if a 
layout of the graph is provided. Multi-level approach and shortest-path bounding 
boxes both require a preprocessing, calculating additional edges and bounding 
boxes, respectively. All these four techniques are tailored to Dijkstra’s algorithm. 
They crucially depend on the fact that Dijkstra’s algorithm is label-setting and 
that it can be terminated when the destination node is settled. 

The focus of this paper is the combination of the four speed-up techniques. 
We first show that, with more or less effort, all 2"* = 16 combinations can be 
implemented. Then, an extensive experimental study of their performance is 
provided. Benchmarks were run on several real-world and generated graphs, 
where operation counts as well as CPU time were measured. 

The next section contains, after some definitions, a description of the speed- 
up techniques and shows how to combine them. Section 3 presents the experi- 
mental setup and data sets for our statistics, and the belonging results are given 
in Section 4. Section 5, finally, gives some conclusions. 

2 Definitions and Problem Description 

2.1 Definitions 

A directed simple graph G is a pair {V,E), where V is the set of nodes and 
E C V X V the set of edges in G. Throughout this paper, the number of nodes, 
|U|, is denoted by n and the number of edges, \E\, by m. 

A path in G is a sequence of nodes ui, . . . ,Uk such that (ui,Ui+i) G E for all 
1 < i < k. Given non-negative edge lengths I : E ^ Rq , the length of a path 
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ui, . . . ,Uk is the sum of weights of its edges, tti+i). The (single- source 

single-target) shortest-path problem consists of finding a path of minimum length 
from a given source s G U to a target t €V. 

A graph layout is a mapping L : U — >■ of the graph’s nodes to the 
Euclidean plane. For ease of notation, we will identify a node v G V with its 
location L{v) in the plane. The Euclidean distance between two nodes u,v G V 
is then denoted by d{u,v). 

2.2 Speed-Up Techniques 

Our base algorithm is Dijkstra’s algorithm using Fibonacci heaps as priority 
queue. In this section, we provide a short description of the four speed-up tech- 
niques, whose combinations are discussed in the next section. 



Goal-Directed Search. This technique uses a potential function on the node 
set. The edge lengths are modified in order to direct the graph search towards 
the target. Let A be such a potential function and l{e) be the length of e. The 
new length of an edge (v,w) is defined to be l{v,w) := l{v,w) — X{v) -\- A(w). 
The potential must fulfill the condition that for each edge e, its new edge length 
1(e) is non-negative, in order to guarantee optimal solutions. 

In case edge lengths are Euclidean distances, the Euclidean distance d(u, t) 
of a node u to the target t is a valid potential, due to the triangular inequality. 
Otherwise, a potential function can be defined as follows: let Umax denote the 
maximum “edge-speed” d(u,v)/l(e), over all edges e = (u,v). The potential of 
a node u can now be defined as X(u) = d(u,t)/vmax- 



Bi-Directed Search. The bi-directed search simultaneously applies the “nor- 
mal” , or forward, variant of the algorithm, starting at the source node, and a 
so-called reverse, or backward, variant of Dijkstra’s algorithm, starting at the 
destination node. With the reverse variant, the algorithm is applied to the re- 
verse graph, i.e., a graph with the same node set V as that of the original graph, 
and the reverse edge set E = {(u,?^) | (v,u) G E}. Let df(u) be the distance 
labels of the forward search and dt(u) the labels of the backward search, respec- 
tively. The algorithm can be terminated when one node has been designated to 
be permanent by both the forward and the reverse algorithm. Then the shortest 
path is determined by the node u with minimum value df(u) -\- db(u) and can 
be composed of the one from the start node to u, found by the forward search, 
and the edges reverted again on the path from the destination to u, found by 
the reverse search. 



Multi-Level Approach. This speed-up technique requires a preprocessing step 
at which the input graph G = (V,E) is decomposed into I -I- 1 (Z > 1) levels 
and enriched with additional edges representing shortest paths between certain 
nodes. This decomposition depends on subsets Si of the graph’s node set for 
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each level, called selected nodes at level i: So := V ^ Si ^ ^ Si. These 

node sets can be determined on diverse criteria; with our implementation, they 
consist of the desired numbers of nodes with highest degree in the graph, which 
has turned out to be an appropriate criterion [20] . 

There are three different types of edges being added to the graph: upward 
edges, going from a node that is not selected at one level to a node selected at 
that level, downward edges, going from selected to non-selected nodes, and level 
edges, passing between selected nodes at one level. The weight of such an edge 
is assigned the length of a shortest path between the end-nodes. 

To find a shortest path between two nodes, then, it suffices for Dijkstra’s 
algorithm to consider a relatively small subgraph of the “multi-level graph” (a 
certain set of upward and of downward edges and a set of level edges passing 
at a maximal level that has to be taken into account for the given source and 
target nodes). 



Shortest- Path Bounding Boxes. This speed-up technique requires a prepro- 
cessing computing all shortest path trees. For each edge e G E, we compute the 
set S{e) of those nodes to which a shortest path starts with edge e. Using a 
given layout, we then store for each edge e G E the bounding box of S(e) in an 
associative array BB with index set E. 

It is then sufficient to perform Dijkstra’s algorithm on the subgraph induced 
by the edges e G E with the target node included in BB[e\. This subgraph can 
be determined on the fly, by excluding all other edges in the search. (One can 
think of bounding boxes as traffic signs which characterize the region that they 
lead to.) 

A variation of this technique has been introduced in [16], where as geometric 
objects angular sectors instead of bounding boxes were used, for application to a 
timetable information system. An extensive study in [23] showed that bounding 
boxes are the fastest geometric objects in terms of running time, and competitive 
with much more complex geometric objects in terms of visited nodes. 



2.3 Combining the Speed-Up Techniques 

In this section, we enlist for every pair of speed-up techniques how we combined 
them. The extension to a combination of three or four techniques is straight 
forward, once the problem of combining two of them is solved. 



Goal-Directed Search and Bi-Directed Search. Combining goal-directed 
and bi-directed search is not as obvious as it may seem at first glance. [18] 
provides a counter-example for the fact that simple application of a goal-directed 
search forward and backward yields a wrong termination condition. However, the 
alternative condition proposed there has been shown in [19] to be quite inefficient, 
as the search in each direction almost reaches the source of the other direction. 
This often results in a slower algorithm. 
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To overcome these deficiencies, we simply use the very same edge weights 
l{v,w) := l{v,w) — X{v) + X(w) for both the forward and the backward search. 
With these weights, the forward search is directed to the target t and the back- 
ward search has no preferred direction, but favors edges that are directed to- 
wards t. This should be (and indeed is) faster than each of the two speed-up 
techniques. This combination computes a shortest path, because a shortest s-t- 
path is the same for given edge weights I and edge weights modified according 
to goal-directed search, 1. 



Goal-Directed Search and Multi-Level Approach. As described in Sec- 
tion 2.2, the multi-level approach basically determines for each query a subgraph 
of the multi-level graph, on which Dijkstra’s algorithm is run to compute a short- 
est path. The computation of this subgraph does not involve edge lengths and 
thus goal-directed search can be simply performed on it. 



Goal-Directed Search and Shortest-Path Bounding Boxes. Similar to 
the multi-level approach, the shortest-path bounding boxes approach determines 
for a given query a subgraph of the original graph. Again, edge lengths are 
irrelevant for the computation of the subgraph and goal-directed search can be 
applied offhand. 



Bi-Directed Search and Multi-Level Approach. Basically, bi-directed 
search can be applied to the subgraph defined by the multi-level approach. In 
our implementation, that subgraph is computed on the fly during Dijkstra’s al- 
gorithm: for each node considered, the set of necessary outgoing edges is deter- 
mined. If applying bi-directed search to the multi-level subgraph, a symmetric, 
backward version of the subgraph computation has to be implemented: for each 
node considered in the backward search, the incoming edges that are part of the 
subgraph have to be determined. 



Bi-Directed Search and Shortest-Path Bounding Boxes. In order to take 
advantage of shortest-path bounding boxes in both directions of a bi-directional 
search, a second set of bounding boxes is needed. For each edge e G E, we 
compute the set S'b(e) of those nodes from which a shortest path ending with e 
exists. We store for each edge e £ E the bounding box of S'b(e) in an associative 
array BBb with index set E. The forward search checks whether the target is 
contained BB{e), the backward search, whether the source is in BBb{e). 



Multi-Level Approach and Shortest-Path Bounding Boxes. The multi- 
level approach enriches a given graph with additional edges. Each new edge 
(ui,Uk) represents a shortest path {ui,U 2 , ■ ■ ■ ,Uk) in G. We annotate such a 
new edge {u\,Uk) with BB{ui,U 2 ), the associated bounding box of the first 
edge on this path. 
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Table 1. Number of nodes and edges for all test graphs 





street 


n 

m 


1444 3045 16471 20466 25982 38823 45852 45073 51510 79456 
3060 7310 34530 42288 57620 79988 98098 91314 110676 172374 




public transport 


n 

m 


409 705 1660 2279 2399 4598 6884 10815 12070 14335 
1215 1681 4327 6015 8008 14937 18601 29351 33728 39887 




planar 


n 

m 


1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 

5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 




waxman 


n 

m 


938 1974 2951 3938 4949 5946 6943 7917 8882 9906 

4070 9504 14506 19658 24474 29648 34764 39138 44208 48730 



3 Experimental Setup 

In this section, we provide details on the input data used, consisting of real-world 
and randomly generated graphs, and on the execution of the experiments. 



3.1 Data 

Real-World Graphs. In our experiments we included a set of graphs that stem 
from real applications. As in other experimental work, it turned out that using 
realistic data is quite important as the performance of the algorithms strongly 
depends on the characteristics of the data. 

Street Graphs. Our street graphs are street networks of US cities and their 
surroundings. These graphs are bi-directed, and edge lengths are Euclidean 
distances. The graphs are fairly large and very sparse because bends are rep- 
resented by polygonal lines. (With such a representation of a street network, 
it is possible to efficiently find the nearest point in a street by a point-to-point 
search.) 

Public Transport Graphs. A public transport graph represents a network 
of trains, buses, and other scheduled vehicles. The nodes of such a graph 
correspond to stations or stops, and there exists an edge between two nodes 
if there is a non-stop connection between the respective stations. The weight 
of an edge is the average travel time of all vehicles that contribute to this 
edge. In particular, the edge lengths are not Euclidean distances in this set 
of graphs. 



Random Graphs. We generated two sets of random graphs that have an 
estimated average out-degree of 2.5 (which corresponds to the average degree 
in the real-world graphs). Each set consists of ten connected, bi-directed graphs 
with (approximately) 1000 • i nodes {i = 1, . . . , 10). 
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Random Planar Graphs. For the construction of random planar graphs, we 
used a generator provided by LED A [24] . A given number of n nodes are uni- 
formly distributed in a square with a lateral length of 1, and a triangulation 
of the nodes is computed. This yields a complete undirected planar graph. 
Finally, edges are deleted at random until the graph contains 2.5 • n edges, 
and each of these is replaced by two directed edges, one in either direction. 

Random Waxman Graphs. The construction of these graphs is based on a 
random graph model introduced by Waxman [25] . Input parameters are the 
number of nodes n and two positive rational numbers a and /3. The nodes 
are again uniformly distributed in a square of a lateral length of 1, and the 
probability that an edge (m,u) exists is /? • exp(— d(M,u)/(-\/2Q;)). Higher j3 
values increase the edge density, while smaller a values increase the den- 
sity of short edges in relation to long edges. To ensure connectedness and 
bi-directedness of the graphs, all nodes that do not belong to the largest 
connected component are deleted (thus, slightly less than n nodes remain) 
and the graph is bi-directed by insertion of missing reverse edges. We set 
a = 0.01 and empirically determined that setting /? = 2.5 • 1620/n yields an 
average degree of 2.5, as wished. 



3.2 Experiments 

We have implemented all combinations of speed-up techniques as described in 
Sections 2.2 and 2.3 in C-|— b, using the graph and Fibonacci heap data structures 
of the LEDA library [24] (version 4.4). The code was compiled with the GNU 
compiler (version 3.3), and experiments were run on an Intel Xeon machine with 
2.6 GHz and 2 GB of memory, running Linux (kernel version 2.4). 

For each graph and combination, we computed for a set of queries shortest 
paths, measuring two types of performance: the mean values of the running times 
(GPU time in seconds) and the number of nodes inserted in the priority queue. 
The queries were chosen at random and the amount of them was determined 
such that statistical relevance can be guaranteed (see also [23]). 



4 Experimental Results 

The outcome of the experimental study is shown in Figures 1-4. Further dia- 
grams that we used for our analysis are depicted in Figures 5-10. Each combina- 
tion is referred to by a 4-tuple of shortcuts: go (goal-directed), bi (bi-directed), 
ml (multi-level), bb (bounding box), and xx if the respective technique is not 
used (e.g., go-bi-xx-bb). In all figures, the graphs are ordered by size, as listed 
in Table 1. 

We calculated two different values denoting relative speed-up: on the one 
hand. Figures 1-4 show the speed-up that we achieved compared to plain Di- 
jkstra, i.e., for each combination of techniques the ratio of the performance of 
plain Dijkstra and the performance of Dijkstra with the specific combination of 
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Fig. 1. Speed-up relative to Dijkstra’s algorithm in terms of visited nodes for real-world 
graphs (in this order: street graphs in red and public transport graphs in blue) 




Fig. 2. Speed-up relative to Dijkstra’s algorithm in terms of visited nodes for generated 
graphs (in this order: random planar graphs in yellow and random Waxman graphs in 
green) 
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Fig. 3. Speed-up relative to Dijkstra’s algorithm in terms of running time for real-world 
graphs (in this order: street graphs in red and public transport graphs in blue) 




Fig. 4. Speed-up relative to Dijkstra’s algorithm in terms of running time for generated 
graphs (in this order: random planar graphs in yellow and random Waxman graphs in 
green) 
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techniques applied. There are separate figures for real-world and random graphs, 
for the number of nodes and running time, respectively. 

On the other hand, for each of the Figures 5-8, we focus on one technique T 
and show for each combination containing T the speed-up that can be achieved 
compared to the combination without T. (Because of lack of space only figures 
dealing with the number of visited nodes are depicted.) For example, when focus- 
ing on bi-directed search and considering the combination go-bi-xx-bb, say, we 
investigate by which factor the performance gets better when the combination 
go-bi-xx-bb is used instead of go-xx-xx-bb only. 

In the following, we discuss, for each technique separately, how combinations 
with the specific technique behave, and then turn to the relation of the two 
performance parameters measured, the number of visited nodes and running 
time: we define the overhead of a combination of techniques to be the ratio of 
running time and the number of visited nodes. In other words, the overhead 
reflects the time spent per node. 



4.1 Speed-Up of the Combinations 

Goal-Directed Search. Individually comparing goal-directed search with plain 
Dijkstra (Figure 5), speed-up varies a lot between the different types of graphs: 
Considering the random graphs, we get a speed-up of about 2 for planar graphs 
but of up to 5 for the Waxman graphs, which is quite surprising. Only little 
speed-up, of less than 2, can be observed for the real-world graphs. 

Concerning the number of visited nodes, adding goal-directed search to the 
multi-level approach is slightly worse than adding it to plain Dijkstra and with 
bi-directed search, we get another slight deterioration. Adding it to bounding 
boxes (and combinations including bounding boxes) is hardly beneficial. 

For real-world graphs, adding goal-directed search to any combination does 
not improve the running time. For generated graphs, however, running time 
decreases. In particular, it is advantageous to add it to a combination containing 
multi-level approach. We conclude that combining goal-directed search with the 
multi-level approach generally seems to be a good idea. 



Bi-Directed Search. Bi-directed search individually gives a speed-up of about 
1.5 for the number of visited nodes (see Figure 6) and for the running time, 
for all types of graphs. For combinations of bi-directed search with other speed- 
up techniques, the situation is different: For the generated graphs, neither the 
number of visited nodes nor the running time improves when bi-directed search 
is applied additionally to goal-directed search. However, running time improves 
with the combination containing the multi-level approach, and also combining 
bi-directed search with bounding boxes works very well. In the latter case, the 
speed-up is about 1.5 (as good as the speed-up of individual bi-directed search) 
for all types of graphs. 
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Fig. 5. Speed-up relative to the combination without goal-directed search in terms 
of visited nodes (in this order: street graphs in red, public transport graphs in blue, 
random planar graphs in yellow, and random Waxman graphs in green) 
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Fig. 6. Speed-up relative to the combination without bi-directed search in terms of 
visited nodes (in this order: street graphs in red, public transport graphs in blue, 
random planar graphs in yellow, and random Waxman graphs in green) 
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Fig. 7. Speed-up relative to the combination without multi-level approach in terms 
of visited nodes (in this order: street graphs in red, public transport graphs in blue, 
random planar graphs in yellow, and random Waxman graphs in green) 




Fig. 8. Speed-up relative to the combination without shortest-path bounding boxes in 
terms of visited nodes (in this order: street graphs in red, public transport graphs in 
blue, random planar graphs in yellow, and random Waxman graphs in green) 
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Multi-Level Approach. The multi-level approach crucially depends on the 
decomposition of the graph. The Waxman graphs could not be decomposed 
properly by the multi-level approach, and therefore all combinations containing 
the latter yield speed-up factors of less than 1, which means a slowing down. 
Thus we consider only the remaining graph classes. 

Adding multi-levels to goal-directed and bi-directed search and their combi- 
nation gives a good improvement in the range between 5 and 12 for the number 
of nodes (see Figure 7). Caused by the big overhead of the multi-level approach, 
however, we get a considerable improvement in running time only for the real- 
world graphs. In combination with bounding boxes, the multi-level approach is 
beneficial only for the number of visited nodes in the case of street graphs. 

The multi-level approach allows tuning of several parameters, such as the 
number of levels and the choice of the selected nodes. The tuning crucially de- 
pends on the input graph [20]. Hence, we believe that considerable improvements 
of the presented results are possible if specific parameters are chosen for every 
single graph. 



Shortest-Path Bounding Boxes. Shortest-path bounding boxes work espe- 
cially well when applied to planar graphs, actually speed-up even increases with 
the size of the graph (see Figure 8). For Waxman graphs, the situation is com- 
pletely different: with the graph size the speed-up gets smaller. This can be ex- 
plained by the fact that large Waxman graphs have, due to construction, more 
long-distance edges than small ones. Because of this, shortest paths become more 
tortuous and the bounding boxes contain more “wrong” nodes. 

Throughout the different types of graphs, bounding boxes individually as well 
as in combination with goal-directed and bi-directed search yield exceptionally 
high speed-ups. Only the combinations that include the multi-level approach 
cannot be improved that much. 



4.2 Overhead 

For goal-directed and bi-directed search, the overhead (time per visited node) 
is quite small, while for bounding boxes it is a factor of about 2 compared to 
plain Dijkstra (see Figures 9 and 10). The overhead caused by the multi-level 
approach is generally high and quite different, depending on the type of graph. 
As Waxman graphs do not decompose well, the overhead for the multi-level ap- 
proach is large and becomes even larger when the size of the graph increases. 
For very large street graphs, the multi-level approach overhead increases dra- 
matically. We assume that it would be necessary to add a third level for graphs 
of this size. 

It is also interesting to note that the relative overhead of the combina- 
tion goal-directed, bi-directed, and multi-level is smaller than just multi-level 
— especially for the generated graphs. 
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Fig. 9. Average running time per visited node in fis for real-world graphs (in this order: 
street graphs in red and public transport graphs in blue) 




Fig. 10. Average running time per visited node in jis for generated graphs (in this 
order: random planar graphs in yellow and random Waxman graphs in green) 
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5 Conclusion and Outlook 

To summarize, we conclude that there are speed-up techniques that combine 
well and others where speed-up does not scale. Our result is that goal-directed 
search and multi-level approach is a good combination and bi-directed search 
with shortest-path bounding boxes complement each other. 

For real-world graphs, a combination including bi-directed search, multi-level, 
and bounding boxes is the best choice as to the number of visited nodes. In terms 
of running time, the winner is bi-directed search in combination with bounding 
boxes. For generated graphs, the best combination is goal-directed, bi-directed, 
and bounding boxes for both the number of nodes and running time. 

Without an expensive preprocessing, the combination of goal-directed and 
bi-directed search is generally the fastest algorithm with smallest search space — 
except for Waxman graphs. For these graphs, pure goal-directed is better than 
the combination with bi-directed search. Actually, goal-directed search is the 
only speed-up technique that works comparatively well for Waxman graphs. 
Because of this different behaviour, we conclude that planar graphs are a bet- 
ter approximation of the real-world graphs than Waxman graphs (although the 
public transport graphs are not planar). 

Except bi-directed search, the speed-up techniques define a modified graph 
in which a shortest path is searched. From this shortest path one can easily 
determine a shortest path in the original graph. It is an interesting question 
whether the techniques can be applied directly, or modified, to improve also the 
running time of other shortest-path algorithms. 

Furthermore, specialized priority queues used in Dijkstra’s algorithm have 
been shown to be fast in practice [26,27]. Using such queues would provide the 
same results for the number of visited nodes. Running times, however, would be 
different and therefore interesting to evaluate. 
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Abstract. Bit-parallelism permits executing several operations simul- 
taneously over a set of bits or numbers stored in a single computer word. 
This technique permits searching for the approximate occurrences of a 
pattern of length m in a text of length n in time 0(\m/w\n), where w is 
the number of bits in the computer word. Although this is asymptotically 
the optimal speedup over the basic 0(mn) time algorithm, it wastes bit- 
parallelism’s power in the common case where m is much smaller than 
w, since w — m bits in the computer words get unused. 

In this paper we explore different ways to increase the bit-parallelism 
when the search pattern is short. First, we show how multiple patterns 
can be packed in a single computer word so as to search for multiple 
patterns simultaneously. Instead of paying 0(rn) time to search for r 
patterns of length m < w, we obtain 0([r/[w/mj]n) time. Second, we 
show how the mechanism permits boosting the search for a single pattern 
of length m < w, which can be searched for in time 0(n/[w/mj) instead 
of 0{n). Finally, we show how to extend these algorithms so that the time 
bounds essentially depend on k instead of m, where k is the maximum 
number of differences permitted. 

Our experimental results show that that the algorithms work well in 
practice, and are the fastest alternatives for a wide range of search pa- 
rameters. 



1 Introduction 

Approximate string matching is an old problem, with applications for example in 
spelling correction, bioinformatics and signal processing [7]. It refers in general 
to searching for substrings of a text that are within a predefined edit distance 
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threshold from a given pattern. Let T = be a text of length n and P = 

a pattern of length m. Here Aa...b denotes the substring of A that begins 
at its ath character and ends at its &th character, for a < b. Let ed{A, B) denote 
the edit distance between the strings A and B, and k be the maximum allowed 
distance. Then the task of approximate string matching is to find all text indices 
j for which ed{P,Th...j) < k for some h < j. 

The most common form of edit distance is Levenshtein distance [5]. It is 
defined as the minimum number of single-character insertions, deletions and 
substitutions needed in order to make A and B equal. In this paper ed{A,B) 
will denote Levenshtein distance. We also use w to denote the computer word 
size in bits, a to denote the size of the alphabet E and \A\ to denote the length 
of the string A. 

Bit-parallelism is the technique of packing several values in a single com- 
puter word and updating them all in a single operation. This technique has 
yielded the fastest approximate string matching algorithms if we exclude filtra- 
tion algorithms (which need anyway to be coupled with a non-filtration one). In 
particular, the 0{\m/w~\kn) algorithm of Wu and Manber [13], the 0{\km/w~\n) 
algorithm of Baeza-Yates and Navarro [1], and the 0{\m/w~\n) algorithm of My- 
ers [6] dominate for almost every value of m, k and cr. 

In complexity terms, Myers’ algorithm is superior to the others. In practice, 
however, Wu & Manber’s algorithm is faster for k = 1 and Baeza-Yates and 
Navarro’s is faster when {k + 2){m — k) < w or k/m is low. The reason is that, 
despite that Myers’ algorithm packs better the state of the search (needing to 
update less computer words) , it needs slightly more operations than its competi- 
tors. Except when m and k are small, the need to update less computer words 
makes Myers’ algorithm faster than the others. However, when m is much smaller 
than w, Myers’ advantage disappears because all the three algorithms need to 
update just one (or very few) computer words. In this case, Myers’ representa- 
tion wastes many bits of the computer word and is unable to take advantage of 
its more compact representation. 

The case where m is much smaller than w is very common in several ap- 
plications. Typically w is 32 or 64 in a modern computer, and for example the 
Pentium 4 processor allows one to use even words of size 128. Myers’ representa- 
tion uses m bits out of those w. In spelling, for example, it is usual to search for 
words, whose average length is 6. In computational biology one can search for 
short DNA or amino acid sequences, of length as small as 4. In signal processing 
applications one can search for sequences composed of a few audio, MIDI or 
video samples. 

In this paper we concentrate on reducing the number of wasted bits in Myers’ 
algorithm, so as to take advantage of its better packing of the search state even 
when m < w. This has been attempted previously [2], where 0(mln/w~\) time 
was obtained. Our technique is different. We first show how to search for several 
patterns simultaneously by packing them all in the same computer word. We can 
search for r patterns of length m < w in 0{\r / \ m / w\\n+ occ) rather than 0{rn) 
time, where occ < rn is the total number of occurrences of all the patterns. We 
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then show how this idea can be pushed further to boost the search for a single 
pattern, so as to obtain 0{n/\w/m\) time instead of 0{n) for m <w. 

Our experimental results show that the presented schemes work well in prac- 
tice. 

2 Dynamic Programming 

In the following e denotes the empty string. To compute Levenshtein distance 
ed{A, B), the dynamic programming algorithm fills an (|A|-|-1) x (|i?|-|-l) table D, 
in which each cell D[i,j] will eventually hold the value ed{Ai,,i, Bi ^). Initially 
the trivially known boundary values D[i,0] = ed{Ai,,i,e) = i and D[0,j] = 
ed(e, Bi,,j) = j are filled. Then the cells D[i,j] are computed for i = 1 . . . |A| 
and j = 1 .. ,\B\ until the desired solution \B\] = ed(xli |s|) = 

ed{A,B) is known. When the values D[i — l,j — 1], D[i,j — 1] and D[i — l,j] 
are known, the value D[i,j] can be computed by using the following well-known 
recurrence. 

D[i,0]=i, D[0,j]=j. 

DU = 

[ OJ ^1-1- min(H[t — 1, j — l],D[i — l,j],D[i,j — 1]), otherwise. 

This distance computation algorithm is easily modified to find approximate 
occurrences of A somewhere inside B [9]. This is done simply by changing 
the boundary condition D[0,j] = j into D[0,j] = 0. In this case D[i,j] = 
min{ed{Ai,,,i, Bh...j),h < j), which corresponds to the earlier definition of ap- 
proximate string matching if we replace A with P and B with T. 

The values of D are usually computed by filling it in a column-wise manner 
for increasing j. This corresponds to scanning the string B (or the text T) one 
character at a time from left to right. At each character the corresponding column 
is completely filled in order of increasing i. This order makes it possible to save 
space by storing only one column at a time, since then the values in column j 
depend only on already computed values in it or values in column j — 1. 

Some properties of matrix D are relevant to our paper [11]: 

-The diagonal property: D[i,j] — D[i — 1, j — 1] = 0 or 1. 

-The adjacency property: D[i,j] — D[i,j — 1] = —1,0, or 1, and 

D[iJ] - D[i - 1, j] = -1,0, or 1. 

3 Myers’ Bit-Parallel Algorithm 

In what follows we will use the following notation in describing bit-operations: 
denotes bitwise “and”, ’j’ denotes bitwise “or”, denotes bitwise “xor”, 
denotes bit complementation, and ’<<’ and ’>>’ denote shifting the bit- 
vector left and right, respectively, using zero filling in both directions. The tth 
bit of the bit vector V is referred to as V[i] and bit positions are assumed to 
grow from right to left. In addition we use superscripts to denote repetition. As 
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an example let V = 1011010 be a bit vector. Then 1^[1] = F[3] = F[6] = 0, 
V[2] = V[4] = V[5] = V\7] = 1, and we could also write V = 101^010 or 
V = 101(10)2. 

We describe here a version of the algorithm [3,8] that is slightly simpler than 
the original by Myers [6]. The algorithm is based on representing the dynamic 
programming table D with vertical, horizontal and diagonal differences and pre- 
computing the matching positions of the pattern into an array of size cr. This is 
done by using the following length-m bit-vectors: 

-Vertical positive delta: VP[i] = 1 at text position j if and only if D[i,j] — 
D[i-l,3] = l. 

-Vertical negative delta: VN[i] = 1 at text position j if and only if D[i,j] — 
D[i-l,j] = -l. 

-Horizontal positive delta: HP[i] = 1 at text position j if and only if D[i,j] — 
D[i,j-l] = l. 

-Horizontal negative delta: HN[i] = 1 at text position j if and only if D[i,j] — 
D[t,j-1] = -1. 

-Diagonal zero delta: D0[i] = 1 at text position j if and only if D[i,j] = 
D[i 

-Pattern match vector PM\ for each A G V: PM\[i] = 1 if and only if Pi = A. 

Initially VP = 1™ and VN = O'" to enforce the boundary condition D[i,Q] = i. 
At text position j the algorithm first computes vector DO by using the old values 
VP and VN and the pattern match vector PMxy Then the new PIP and HN 
are computed by using DO and the old VP and VN. Finally, vectors VP and 
VN are updated by using the new DO, PIN and HP. Fig. 1 shows the complete 
formula for updating the vectors, and Fig. 2 shows the preprocessing of table 
PM and the higher-level search scheme. We refer the reader to [3,6] for a more 
detailed explanation of the formula in Fig. 1. 



Step(j) 




1. 


DO ^ & UP) -f UP) 


VP) 1 PMt,- I VN 


2. 


HP ^VN \ ~ (DO 1 UP) 




3. 


HN ^ VP & DO 




4. 


VP ^ (HN « 1) 1 ~ (DO 


1 (PP«1)) 


5. 


VN ^ (HP « 1) & DO 





Fig. 1. Updating the delta vectors at column j. 



The algorithm in Fig. 2 computes the value D[m,j] explicitly in the currDist 
variable by using the horizontal delta vectors (the initial value of currDist is 
D[m,0] = m). A pattern occurrence with at most k errors is found at text 
position j whenever D[m,j] < k. 

We point out that the boundary condition D[0,j] = 0 is enforced on lines 
4 and 5 in Fig. 1. After the horizontal delta vectors HP and HN are shifted 
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ComputePM(P) 

1. For A G r Do PMx ^ 0™ 

2. For i e 1 ... m Do PMp, ^ PMp, | 

Search(P, T, k) 

1. ComputePM(P) 

2. FiV^O™, FP^ 1"*, currPist^m 

3. For j G 1 . . . n Do 

4. Step(j) 

5. If PP & 10’""^ = Then 

6. currDist <r- currDist + 1 

7. Else If HN & Then 

8. currDist t— currDist — 1 

9. If currDist < k Then 

10. Report occurrence at j 



Fig. 2. Preprocessing the PM-table and conducting the search. 



left, their first bits correspond to the difference D[0, j] — D[0, j — 1], This is the 
only phase in the algorithm where the values from row 0 are relevant. And as 
we assume zero filling, the left shifts correctly set HP[1] = HN[1] = 0 to encode 
the difference D[0,j] — D[0,j — 1] = 0. 

The running time of the algorithm is 0{n) when m < w, as there are only a 
constant number of operations per text character. The general running time is 
0{\m/w~\n) as a vector of length m may be simulated in 0{\m/w'\) time using 
0{\m/w'\) bit-vectors of length w. 

4 Searching for Several Patterns Simultaneonsly 

We show how Myers’ algorithm can be used to search for r patterns of length 
m simultaneously. For simplicity we will assume rm < w; otherwise the search 
patterns must be split into groups of at most [w/mj patterns each, and each 
group searched for separately. Hence our search time will be 0{lr/[w/m\~\n + 
occ), as opposed to the 0{rn) time that would be achieved by searching for each 
pattern separately. Here occ < rn stands for the total number of occurrences of 
all the patterns. When w/m > 2, our complexity can be written as 0{\rm/uP\n+ 
occ). 

Consider the situation where wlm>2 and Myers’ algorithm is used. Fig. 3a 
shows how the algorithm fails to take full advantage of bit-parallelism in that 
situation as at least one half of the bits in the bit vectors is not used. Fig. 3b 
depicts our proposal: encode several patterns into the bit vectors and search 
for them in parallel. There are several obstacles in achieving this goal correctly, 
which will be discussed next. 
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Fig. 3. For short patterns (m < w) Myers’ algorithm (a) wastes w — m bits. Our 
proposal (b) packs several pattern into the same computer word, and wastes only 
w — rm bits. 



4.1 Updating the Delta Vectors 

A natural starting point is the problem of encoding and updating several patterns 
in the delta vectors. Let us denote a parallel version of a delta vector with 
the superscript p. We encode the patterns repeatedly into the vectors without 
leaving any space between them. For example DQP[i] corresponds to the bit 
U0[((z — 1) mod m) + 1] in the UO-vector of the |"i/m]th pattern. The pattern 
match vectors PM are computed in normal fashion for the concatenation of 
the patterns. This correctly aligns the patterns with their positions in the bit 
vectors. 

When the parallel vectors are updated, we need to ensure that the values for 
different patterns do not interfere with each other and that the boundary values 
D[Q,j] = 0 are used correctly. From the update formula in Fig. 1 it is obvious 
that only the addition (“+”) on line 2 and the left shifts on lines 4 and 5 can 
cause incorrect interference. 

The addition operation may be handled by temporarily setting off the bits in 
VPP that correspond to the last characters of the patterns. When this is done 
before the addition, there cannot be an incorrect overflow, and on the other hand 
the correct behaviour of the algorithm is not affected: The value VP'^\i] can affect 
only the values D{p\i + h\ for some /i > 0. It turns out that a similar modification 
works also with the left shifts. If the bits that correspond to the last characters 
of the patterns are temporarily set off in HPP and HN'p then, after shifting left, 
the positions in HPP and HN^ that correspond to the first characters of the 
patterns will correctly have a zero bit. The first pattern gets the zero bits from 
zero filling of the shift. Therefore, this second modification both removes possible 
interference and enforces the boundary condition D[0,j] — D[0,j — 1] = 0. 

Both modifications are implemented by andmg the corresponding vectors 
with the bit mask ZM = (01'"“^)’’. Figure 4 gives the code for a step. 

4.2 Keeping the Scores 

A second problem is computing the value D[m,j] explicitly for each of the r 
patterns. We handle this by using bit-parallel counters in a somewhat similar 
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MStep(j) 




1. 


XP^VP &i ZM 




2. 


DO ^ ({(PMt^ & XP) + XP) 


XP) 1 PMt^ I VN 


3. 


HP^VN \ ~ (DO 1 VP) 




4. 


HN ^ UP & DO 




5. 


XP ^ HP k ZM, XN ^ HN & ZM 


6. 


VP^ {{XN << 1) 1 ~ (DO 


1 (AP«1))) 


7. 


VN ^ {XP « 1) & DO 





Fig. 4. Updating the delta vectors at column j, when searching for multiple patterns. 



fashion to [4]. Let MG' be a length-m bit-parallel counter vector. We set up into 
MC an TO-bit counter for each pattern. Let MC{i) be the value of the fth counter. 
The counters are aligned with the patterns so that MC{1) occupies the first m 
bits, MC{2) the next m bits, and so on. We will represent value zero in each 
counter as 6 = and the value MC{i) will be translated to actually mean 

b— MC{i). This gives each counter MC{i) the following properties: (1) b < 2*”. 
(2) b — m > 0. (3) The mth bit of MC{i) is set if and only if 6 — MC{i) < k. 
(4) In terms of updating the translated value of MC{i), the roles of adding and 
subtracting from it are reversed. 

The significance of properties (1) and (2) is that they ensure that the values 
of the counters will not overflow outside their regions. Their correctness depends 
on the assumption k < m. This is not a true restriction as it excludes only the 
case of trivial matching (k = m) . 

We use a length-m bit-mask EM = (10™“^)’’ to update MC. The bits set 
in HPP & EM and HN^ & EM correspond to the last bits of the counters 
that need to be incremented and decremented, respectively. Thus, remembering 
to reverse addition and subtraction, MC may be updated by setting MC ^ 
MC + {{HNP & EM) » (to - 1)) - {{HPP & EM) » (to - 1)). 

Property (3) means that the last bit of MC{i) signals whether the zth pattern 
matches at the current position. Hence, whenever MC Sz EM ^ 0’’'" we have 
an occurrence of some of the patterns in T. At this point we can examine the 
bit positions of EM one by one to determine which patterns have matched and 
report their occurrences. This, however, adds 0(rmin(n, occ)) time in the worst 
case to report the occ occurrences of all the patterns. We show next how to 
reduce this to O(occ). 

Fig. 5 gives the code to search for the patterns P^ ... P^. 



4.3 Reporting the Occurrences 

Let us assume that we want to identify which bits in mask OM = MC 8z EM 
are set, in time proportional to the number of bits set. If we achieve this, the 
total time to report all the occ occurrences of all the patterns will be O(occ). One 
choice is to precompute a table E that, for any value of OM, gives the position 
of the first bit set in OM. That is, if F[OM] = s, then we report an occurrence 
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MComputePM(pi . . . P’') 

1. For A G P Do PMa ^ 0"*’’ 

2. For s G 1 . . . r Do 

3. For i G 1 ... m Do PMp« ^ PMp> \ 

MSearch(pi ...P'',T, k) 

1. MComputePM(P) 

2. ZM ^ (01™-^)’', EM ^ (lO"*-!)" 

3. VN ^ 0’"'', VP ^ 

4. MG ^ (2™-i + fc) X (0’"-^)" 

5. For ji G 1 . . . n Do 

6. MStep(ji) 

7. MC ^ MC + {(HN & EM) » (m - 1)) - {{HP & PM) >> (m - 1)) 

8. If MC & EM / 0"’’" Then MReport(j, MC & PM) 



Fig. 5. Preprocessing the PM-table and conducting the search for multiple patterns. 



of the (s/m)th pattern at the current text position j, clear the sth bit in OM 
by doing OM t— OM & ~ (1 << (s — 1)), and repeat until OM becomes zero. 

The only problem of this approach is that table F has 2™ entries, which is 
too much. Fortunately, we can compute the s values efficiently without resorting 
to look-up tables. The key observation is that the position of the highest bit set 
in OM is effectively the function [log 2 (OM)J + 1 (we number the bits from 1 to 
w), i.e. it holds that 



2U°S2(®)J < X < 2L^°®2(a:)J+l^ 

1 << Llog2(a;)J < X < 1 << ([log2(x)J -h 1). 

The function [log 2 (x)J for an integer x can be computed in 0(1) time in mod- 
ern computer architectures by converting x into a floating point number, and 
extracting the exponent, which requires only two additions and a shift. This 
assumes that the floating point number is represented in a certain way, in par- 
ticular that the radix is 2, and that the number is normalized. The “industry 
standard” IEEE floating point representation meets these requirements. For the 
details and other solutions for the integer logarithm of base 2, refer e.g. to [12]. 
ISO C99 standard conforming C compilers also provide a function to extract the 
exponent directly, and many CPUs even have a dedicated machine instruction 
for [log 2 (a;)J function. Fig. 6 gives the code. 

For architectures where [log 2 (a;)J is hard to compute, we can still manage 
to obtain 0(min(n, occ) log r) time as follows. To detect the bits set in OM, we 
check its two halves. If some half is zero, we can finish there. Otherwise, we 
recursively check its two halves. We continue the process until we have isolated 
each individual bit set in OM. In the worst case, each such bit has cost us 
O(logr) halving steps. 
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MReport(j, OM) 

1. While OM / O’" Do 

2. St- [log2(OM)J 

3. Report occurrence of pO+i)/m position j 

4. OM ^ OM & ~ (1<< s) 



Fig. 6. Reporting occurrences at current text position. 



4.4 Handling Different Lengths and Thresholds 

For simplicity we have assumed that all the patterns are of the same length and 
are all searched with the same k. The method, however, can be adapted with 
little problems to different m and k for each pattern. 

If the lengths are mi . . . m^ and the thresholds are ki ... kr, we have to and 
the vertical and horizontal vectors with ZM = 01’”''“^ ^ ^ and 

this fixes the problem of updating the delta vectors. With respect to the counters, 
the ith counter must be represented as bi — MC(i), where bi = -I- ki. 

One delicacy is the update of MC, since the formula we gave to align all 
the HPP bits at the beginning of the counters involved “>> (m — 1)”, and this 
works only when all the patterns are of the same length. If they are not, we could 
align the counters so that they start at the end of their areas, hence removing 
the need for the shift at all. To avoid overflows, we should sort the patterns in 
increasing length order prior to packing them in the computer word. The price 
is that we will need m^ extra bits at the end of the bit mask to hold the largest 
counter. An alternative solution would be to handle the last counter separately. 
This would avoid the shifts, and effectively adds only a few operations. 

Finally, reporting the occurrences works just as before, except that the pat- 
tern number we report is no longer (s -I- l)/m (Fig. 6). The correct pattern 
number can be computed efficiently e.g. using a look-up table indexed with s. 
The size of the table is only 0{w), as s < w — 1. 

5 Boosting the Search for One Pattern 

Up to now we have shown how to take advantage of wasted bits by searching 
for several patterns simultaneously. Yet, if we only want to search for a single 
pattern, we still waste the bits. In this section we show how the technique de- 
veloped for multiple patterns can be adapted to boost the search for a single 
pattern. 

The main idea is to search for multiple copies of the same pattern P and 
parallelize the access to the text. Say that r = [ru/mj. Then we search for r 
copies of P using a single computer word, with the same technique developed 
for multiple patterns. 

Of course this is of little interest in principle, as all the copies of the pattern 
will report the same occurrences. However, the key idea here will be to search a 
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different text segment for each pattern copy. We divide the text T into r equal- 
sized subtexts T = . . .T^. Text T^, of length £ = |"n/r], will be searched 

for the sth copy of P, and therefore all the occurrences of P in T will be found. 

Our search will perform |"n/r] steps, where step j will access r text characters 
Tj, Tj+i, Tj+ 2 i, ■ ■ ■ , Tj+(r-i)£- With those r characters ci . . . we should build 
the corresponding PM mask to execute a single step. This is easily done by 
using 

PM ^ PM^^ I « to) I (PMe3 « 2to) I ... I (PM,^ « (r - 1)to) 

We must exercise some care at the boundaries between consecutive text seg- 
ments. On the one hand, processing of text segment (1 < s < r) should 
continue up to to -I- fc — 1 characters in Ts+i in order to provide the adequate 
context for the possible occurrences in the beginning of Tg+i- On the other hand, 
the processing of Ts+i must avoid reporting occurrences at the first to -I- A: — 1 
positions to avoid reporting them twice. Finally, occurrences may be reported 
out of order if printed immediately, so it is necessary to store them in r buffer 
arrays in order to report them ordered at the end. 

Adding up the \n/r~\ = \n/\w /mW bit-parallel steps required plus the n 
character accesses to compute PM, we obtain 0{\n/\w/m\'\) complexity for 
m < w. 

6 Long Patterns and fc-Differences Problem 

We have shown how to utilize the bits in computer word economically, but our 
methods assume that m < w. We now sketch a method that can handle longer 
patterns, and can pack more patterns in the same computer word. The basic 
assumption here is that we are only interested in pattern occurrences that have 
at most k differences. This is the situation that is most interesting in practice, 
and usually we can assume that k is much smaller than to. Our goal is to obtain 
similar time bounds as above, but replace to with k in the complexities. The 
difference will be that these become average case complexities now. 

The method is similar to our basic algorithms, but now we use an adaptation 
of Ukkonen’s well-known “cut-off” algorithm [10]. That algorithm fills the table 
D in column- wise order, and computes the values D[i,j] in column j for only 
i < £j, where 

£j = 1 + max{i | D[i,j — 1] < k}. 

The cut-off heuristic is based on the fact that the search result does not depend 
on cells whose value is larger than k. From the diagonal property it follows that 
once D[i,j] > k, then D[i + h,j + h] > k for all /i > 0 (within the bounds of D). 
And a consequence of this is that D[i,j] > k for i > £j. 

After evaluating the current column of the matrix up to the row £j, the value 
£j+i is computed, and the algorithm continues with the next column j + f. The 
evaluation of £j takes 0(1) amortized time, and its expected value L{k) is 0{k), 
and hence the whole algorithm takes only 0(nk) time. 
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Myers adapted his 0{n\m/w'\) algorithm to use the cut-off heuristic as well. 
In principle the idea is very simple; since on average the search ends at row L(k), 
it is enough to use only L{k) bits of the computer word on average (actually he 
used w\L{k)/uP\ bits), and only in some text positions (e.g. when the pattern 
matches) one has to use more bits. Only two modifications to the basic method 
are needed. We must be able to decide which is the last active row in order to 
compute the number of bits required for each text position, and we must be 
able to handle the communication between the boundaries of the consecutive 
computer words. Both problems are easy to solve, for details refer to [6]. With 
these modifications Myers was able to obtain his 0{n\L{k) /vj]) average time 
algorithm. 

We can do exactly the same here. We use only b = max{L(fc), |"log(m-|-A:)] -1-1} 
bits for each pattern and pack them into the same computer word just like in our 
basic method. We need L{k) bits as L{k) is the row number where the search 
is expected to end, and at least |"log(m -I- fc)] + 1 bits to avoid overflowing the 
counters. Therefore we are going to search for [w/b\ patterns in parallel. 

If for some text positions b bits are not enough, we use as many computer 
words as needed, each having b bits allocated for each pattern. Therefore, the 
6-bit blocks in the first computer word correspond to the first b characters of 
the corresponding patterns, and the 6-bit blocks in the second word correspond 
to the next 6 characters of the patterns, and so on. In total we need |"m/6] 
computer words, but on average use only one for each text position. 

The counters for each pattern have only 6 bits now, which means that the 
maximum pattern length is limited to — k. The previous counters limited 
the pattern length to 2™“^ — k, but at the same time assumed that the pattern 
length was < w/2. Using the cut-off method, we have less bits for the counters, 
but in effect we can use longer patterns, the upper bound being m = — k. 

The tools we have developed for the basic method can be applied to modify 
Myers’ cut-off algorithm to search for \w/h\ patterns simultaneously. The only 
additional modification we need is that we must add a new computer word 
whenever any of the pattern counters has accumulated k differences, and this is 
trivial to detect with our counters model. On the other hand, this modification 
means that L{k) must grow as the function of r. It has been shown in [7] that 
L{k) = k / {1 — e/ -/a) + 0{1) for r = I. For reasonably small r this bound should 
not be affected much, as the probability of a match is exponentially decreasing 
for m > L{k). 

The result is that we can search for r patterns with at most k differences in 
0(n|"r/[w/6J]) expected time. Finally, it is possible to apply the same scheme 
for single pattern search as well, resulting in 0{\n/\w/b\\) expected time. The 
method is useful even for short patterns (where we could apply our basic method 
also), because we can use tighter packing when b < m. 
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7 Experimental Results 

We have implemented all the algorithms in C and compiled them using GCC 
3.3.1 with full optimizations. The experiments were run on a Sparc Ultra 2 with 
128 MB RAM that was dedicated solely for our tests. The word size of the machine 
is 64 bits. 

In the experiments we used DNA from baker’s yeast and natural language 
English text from the TREC collection. Each text was cut into 4 million charac- 
ters for testing. The patterns were selected randomly from the texts. We com- 
pared the performance of our algorithms against previous work. The algorithms 
included in the experiments were: 

Parallel BPM: Our parallelized single-pattern search algorithm (Section 5). 
We used r = 3 for m = 8 and m = 16, r = 2 for m = 32, and r = 2 and 
cut-off (Section 6) for m = 64. 

Our multi-pattern algorithm: The basic multipattern algorithm (Section 4) 
or its cut-off version (Section 6). We determined which version to use by 
using experimentally computed estimates for L(k). 

BPM: Myers’ original algorithm [6], whose complexity is 0{\m/w'\n). We used 
our implementation, which was roughly 20 % faster than the original code 
of Myers on the test computer. 

BPD: Non-deterministic finite state automaton bit-parallelized by diagonals 
[1]. The complexity is 0{\km/vj\n). Implemented by its original authors. 
BPR: Non-deterministic finite state automaton bit-parallelized by rows [13]. 
The complexity is 0{\m/vj\kn). We used our implementation, with hand 
optimized special code for fc = 1 . . . 7. 

For each tested combination of m and k, we measured the average time per 
pattern when searching for 50 patterns. The set of patterns was the same for 
each algorithm. The results are shown in Fig. 7. Our algorithms are clear winners 
in most of the cases. 

Our single-pattern parallel search algorithm is beaten only when fc = 1, as 
BPR needs to do very little work in that case, or when the probability of finding 
occurrences becomes so high that our more complicated scheme for occurrence 
checking becomes very costly. At this point we would like to note, that the 
occurrence checking part of our single-pattern algorithm has not yet been fully 
optimized in practice. 

Our multi-pattern algorithm is also shown to be very fast: in these tests it is 
worse than a single-pattern algorithm only when wj2 < m and k is moderately 
high with relation to the alphabet size. 

8 Conclusions 

Bit-parallel algorithms are currently the fastest approximate string matching 
algorithms when Levenshtein distance is used. In particular, the algorithm of 
Myers [6] dominates the field when the pattern is long enough, thanks to its 
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Fig. 7. The plots show the average time for searching a single pattern. 



better packing of the search state in the bits of the computer word. In this 
paper we showed how this algorithm can be modified to take advantage of the 
wasted bits when the pattern is short. We have shown two ways to do this. 
The first one permits searching for several patterns simultaneously. The second 
one boosts the search for a single pattern by processing several text positions 
simultaneously. 
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We have shown, both analytically and experimentally, that our algorithms 

are significantly faster than all the other bit-parallel algorithms when the pattern 

is short or if k is moderate with respect to the alphabet size. 
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Biology has become a computational science. In their efforts to understand the 
functioning of cells at a molecular level, biologists make use of a growing array 
of databases that codify knowledge about genomes, the genes within them, the 
structure and function of the proteins encoded by the genes, and the interactions 
among genes, RNA molecules, proteins, molecular machines and other chemical 
components of the cell. Biologists have access to high-throughput measurement 
technologies such as DNA microarrays, which can measure the expression levels 
of tens of thousands of genes in a single experiment. 

Most computational problems in genomics do not fit the standard computer 
science paradigms in which a well-defined function is to be computed exactly or 
approximately. Rather, the goal is to determine nature’s ground truth. The ob- 
ject to be determined may be well-defined - a genomic sequence, an evolutionary 
tree, or a classification of biological samples, for example - but the criterion used 
to evaluate the result of the computation may be ill-defined and subjective. In 
such cases several different computational methods may be tried, in the hope 
that a consensus solution will emerge. Often, the goal may be simply to explore 
a body of data for significant patterns, with no predefined objective. Often great 
effort goes into understanding the particular characteristics of a single impor- 
tant data set, such as the human genome, rather than devising an algorithm 
that works for all possible data sets. Sometimes there is an iterative process of 
computation and data acquisition, in which the computation suggests patterns 
in data and experimentation generates further data to confirm or disconfirm the 
suggested patterns. Sometimes the computational method used to extract pat- 
terns from data is less important than the statistical method used to evaluate 
the validity of the discovered patterns. 

All of these characteristics hold not only in genomics but throughout the nat- 
ural sciences, but they have not received sufficient consideration within computer 
science. 

Many problems in genomics are attacked by devising a stochastic model of 
the generation of the data. The model includes both observed variables, which 
are available as experimental data, and hidden variables, which illuminate the 
structure of the data and need to be inferred from the observed variables. For 
example, the observed variables may be genomic sequences, and the hidden vari- 
ables may be the positions of genes within those sequences. Machine learning 
theory provides general methods based on maximum likelihood or maximum a 
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posteriori probability for estimating the hidden variables. A useful goal for exper- 
imental algorithmics would be to characterize the performance of these general 
methods. 

Often a genomics problem can be viewed as piecing together a puzzle using 
diverse types of evidence. Commitments are made to those features of the solu- 
tion that are supported by the largest weight of evidence. Those commitments 
provide a kind of scaffold for the solution, and commitments are successively 
made to further features that are supported by less direct evidence but are con- 
sistent with the commitments already made. This was the approach used by 
the Celera group in sequencing the human genome and other large, complex 
genomes. 

The speaker will illustrate these themes in connection with three specific 
problems: finding the sites at which proteins bind to the genome to regulate 
the transcription of genes, finding large-scale patterns of protein-protein inter- 
action that are conserved in several species, and determinining the variations 
among individuals that occur at so-called polymorphic sites in their genomes. 
No knowledge of molecular biology will be assumed. 
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Abstract. The suffix array of a string T is basically a sorted list of all 
the suffixes of T. Suffix arrays have been fundamental index data struc- 
tures in computational biology. If we are to search a DNA sequence in a 
genome sequence, we construct the suffix array for the genome sequence 
and then search the DNA sequence in the suffix array. In this paper, 
we consider the construction of the suffix array of T of length n where 
the size of the alphabet is fixed. It has been well-known that one can 
construct the suffix array of T in 0(n) time by constructing suffix tree 
of T and traversing the suffix tree. Although this approach takes 0(n) 
time, it is not appropriate for practical use because it uses a lot of spaces 
and it is complicated to implement. Recently, almost at the same time, 
several algorithms have been developed to directly construct suffix ar- 
rays in 0{n) time. However, these algorithms are developed for integer 
alphabets and thus do not exploit the properties given when the size of 
the alphabet is fixed. We present a fast algorithm for constructing suffix 
arrays for the fixed-size alphabet. Our algorithm constructs suffix arrays 
faster than any other algorithms developed for integer or general alpha- 
bets when the size of the alphabet is fixed. For example, we reduced the 
time required for constructing suffix arrays for DNA sequences by 25%- 
38%. In addition, we do not sacrifice the space to improve the running 
time. The space required by our algorithm is almost equal to or even less 
than those required by previous fast algorithms. 



1 Introduction 

The string searching problem is finding a pattern string P of length m in a 
text string T of length n. It occurs in many practical applications and has long 
been studied [7]. Recently, searching DNA sequences in full genome sequences 
is becoming one of the primary operations in bioinformatics areas. The studies 
for efficient pattern search are divided into two approaches: One approach is to 
preprocess the pattern. Preprocessing takes 0{m) time and then searching takes 
0{n) time. The other is to build a full-text index data structure for the text. 

* This work is supported by Korea Research Foundation grant KRF-2003-03-D00343. 
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Building the index data structure takes 0{n) time and searching takes 0{m) 
time. The latter approach is more appropriate than the former when we are to 
search DNA sequences in full genome sequences because the text is much longer 
than the pattern and we have to search many patterns in the text. 

Two well-known such index data structures are suffix trees and suffix arrays. 
The suffix tree due to McCreight [17] is a compacted trie of all suffixes of the 
text. It was designed as a simplified version of Weiner’s position tree [22]. The 
suffix array due to Manber and Myers [16] and independently due to Gonnet et 
al. [6] is basically a sorted list of all the suffixes of the text. Since suffix arrays 
consume less space than suffix trees, suffix arrays are preferred to suffix trees. 

When we consider the complexity of index data structures, there are three 
types of alphabets from which text T of length n is drawn: (i) a fixed-size alpha- 
bet, (ii) an integer alphabet where symbols are integers in the range [0,n°] for 
a constant c, and (iii) a general alphabet in which the only operations on string 
T are symbol comparisons. 

We only consider fixed-size alphabets in this paper. The suffix tree of T can 
be constructed in 0{n) time due to McCreight [17], Ukkonen [21], Farach et 
al. [2,3] and so on. The suffix array of T can be constructed in 0{n) time by 
constructing the suffix tree of T and then traversing it. Although this algorithm 
constructs the suffix array in 0{n) time, it is not appropriate for practical use 
because it uses a lot of spaces and is complicated to implement. 

Manber and Myers [16] and Gusfield [8] proposed 0(n log n)-time algorithms 
for constructing suffix arrays without using suffix trees. Recently, almost at the 
same time, several algorithms have been developed to directly construct suffix 
arrays in 0{n) time. They are Kim et al.’s [14] algorithm, Ko and Alum’s [15] 
algorithm, Karkkainen and Sanders’ [13] algorithm, and Hon et al.’s [11] algo- 
rithm. They are are based on similar recursive divide-and-conquer scheme. The 
recursive divide-and-conquer scheme is as follows. 

1. Partition the suffixes of T into two groups A and B, and generate a string 
T' such that the suffixes in T' corresponds to the suffixes in A. This step 
requires encoding several symbols in T into a new symbol in T' . 

2. Gonstruct the suffix array of T' recursively. 

3. Gonstruct the suffix array for A directly from the suffix array of T' . 

4. Gonstruct the suffix array for B using the suffix array for A. 

5. Merge the two suffix arrays for A and B to get the suffix array of T. 

Kim et al. [14] followed Farach et al.’s [3] odd-even scheme, that is, divided the 
suffixes of T into odd suffixes and even suffixes to get an algorithm running in 
0{n) time. Karkkainen and Sanders [13] used skew scheme, that is, divided the 
suffixes of T into suffixes beginning at positions i mod 3 yf 0 (group A) and the 
other suffixes beginning at positions i mod 3 = 0 (group B). Ko and Alum [15] 
divided the suffixes of T into S'-type and L-type suffixes. This algorithm does 
not require a string T' and performs steps 3-5 in somewhat different way. Hon et 
al. [11] followed the odd-even scheme. They seems to have focused on reducing 
space rather than enhancing the running time. They used the backward search 
to merge the succinctly represented odd and even arrays. 
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In practice, Ko and Alum’s algorithm and Karkkainen and Sanders’ skew 
algorithm run fast but Kim et al.’s odd-even algorithm runs slower than the 
two algorithms above. It is quite counter-intuitive that an algorithm based on 
the odd-even scheme is slower than an algorithm based on the skew scheme be- 
cause the odd-even scheme has some advantages over the skew scheme such as 
less recursive calls and fast encoding. Although the odd-even scheme has some 
advantages, the merging step presented in Kim et al.’s algorithm is quite com- 
plicated and too slow and thus Kim et al.’s algorithm is slow overall. Therefore, 
it is natural to ask if there is a faster algorithm using the odd-even scheme by 
adopting a fast odd-even merging algorithm. 

We got an affirmative answer to this question when the size of the alphabet 
is fixed. We present a fast odd-even algorithm constructing suffix arrays for 
the fixed-size alphabet by developing a fast merging algorithm. Our merging 
algorithm uses the backward search and thus requires a data structure for the 
backward search in suffix arrays. The data structure for the backward search in 
suffix arrays is quite different from the backward search in succinctly represented 
suffix arrays suggested by Hon et al. [11]. Our algorithm runs in 0(n log log n) 
time asymptotically. However, the experiments show that our algorithm is faster 
than any other previous algorithms running in 0(n) time. The reason for this is 
that log log n is a small number (log log n = 6 if n = 2®'^) in practical situation 
and thus log log n can be considered to be a constant. 

We describe the construction algorithm in Section 2. In Section 3, we measure 
the running time of this algorithm and compare it with those of previous algo- 
rithms. In Section 4, we further analyze our construction algorithm to explain 
why our algorithm runs so fast. In Section 5, we conclude with some remarks. 



2 Construction Algorithm 

We first introduce some notations and then we describe our construction algo- 
rithm. In describing our construction algorithm, we first describe the odd-even 
scheme, then describe the merging algorithm, and finally analyze the time com- 
plexity. 

Consider a string T of length n over an alphabet E. Let T\i] for 1 < i < n 
denote the Ah symbol of string T. We assume that T[n] is a special symbol # 
which is lexicographically smaller than any other symbol in E. The suffix array 
SAt of T is basically a sorted list of all the suffixes of T. However, suffixes 
themselves are too heavy to be stored and thus only the starting positions of 
the suffixes are stored in SAt- Figure 1 shows an example of a suffix array of 
aaaabbbbaaabbbaabbb^ . An odd suffix of T is a suffix of T starting at an odd 
position and an even suffix is a suffix starting at an even position. The odd array 
SAo and the even array SA^. of T are sorted lists of all odd suffixes and all even 
suffixes, respectively. 
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Fig. 1. The suffix array of aaaabbbbaaabbbaabbb^ . We depicted corresponding suffix 
to each entry of the suffix array. If the corresponding suffix is too long, we depicted a 
prefix of it rather than the whole suffix. 



2.1 Odd-Even Scheme 

We describe the odd-even scheme by elaborating the recursive divide-and- 
conquer scheme presented in the previous section. 

1. Encode the given string T into a half-sized string T': We encode T into T' 
by replacing each pair of adjacent symbols (T[2f — l],T[2z]), 1 < t < n/2, 
with a new symbol. How to encode T into T' is as follows. 

• Sort the pairs of adjacent symbols (T[2f — l],T[2i]) lexicographically 
and then remove duplicates: We use radix-sort to sort the pairs and 
we perform a scan on the sorted pairs to remove duplicates. Both the 
radix-sort and the scan take 0{n) time. 

• Map the ith lexicographically smallest pair of adjacent symbols into 
integer v. The integer i is in the range [l,n/2] because the number of 
pairs is at most n/2. 

• Replace (T[2i — l],T[2t]) with the integer it is mapped into. 

Fig. 2 shows how to encode T = aaaabbbbaaabbbaabbb^ of length 20. Af- 
ter we sort the pairs (T[l], T[2]), (T[3], T[4]), ..., (T[19], T[20]) and remove 
duplicates, we are left with 4 distinct pairs which are aa, ab, b=ff, and bb. 
We map aa, ab, 6#, and bb into 1, 2, 3, and 4, respectively. Then, we get 
To = 1144124143 of length 10. 

2. Construct the suffix array SAt' of T' recursively. 

3. Construct the odd array SAo of T from SAt''. Since the ith suffix of T' corre- 
sponds to the (2f — l)st suffix of T, we get SAo[k] by computing 2SAT'[k] — 1 
for all k. For example, S'Aq[2] = 2SAt'\^] — 1 = 9 in Fig. 2. 

4. Construct the even array SAg of T from the odd array SAq'. An even suffix 
is one symbol followed by an odd suffix. For example, the 8th suffix of T is 
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Fig. 2. An example of constructing the suffix array for aaaabbbbaaabbbaabbb#. 



T[8] followed by the 9th suffix of T. We make tuples for even suffixes: the 
first element of a tuple is T[2z] and the second element is the (2z+ l)st suffix 
of T. First, we sort the tuples by the second elements (this result is given in 
S'Ao). Then we stably sort the tuples by the first elements and we get SA^. 
5. Merge the odd array SAq and even array SAg. 

The odd-even scheme takes 0{n) time except the merging step [14]. 



2.2 Merging 

We first describe the backward search on which our merging algorithm based 
and then describe our merging algorithm. 

The backward search is finding the location of a pattern P = piP2 ■ ■ ■ Pm in 
the suffix array SAt by scanning P by one symbol in reverse order: We first 
find the location of Pm in SAt, then the location of Pm-iPm, the location of 
Pm-2Pm-iPm, ' ' and ctc. Wc repeat this procedure until we find the location 
of P1P2 ■■ ‘Pm- One advantage of the backward search is that the locations of all 
suffixes of P are found while we are finding the location of P. 

The backward search was introduced by Ferragina and Manzini [4,5]. They 
used it to develop an opportunistic data structures that uses Burrow- Wheeler 
transformation [1]. The backward search has been used to search patterns in the 
succinctly represented suffix arrays [9,18,19] which is suggested by Grossi and 
Vitter [10]. Hon et al. [11] used the backward search to merge two succinctly 









306 D.K. Kim, J. Jo, and H. Park 

represented suffix arrays. When it comes to suffix arrays (not succinctly rep- 
resented), Sim et al. [20] developed a data structure that makes the backward 
search of P in SAt be performed in 0(m log |i7|) time with preprocessing SAx 
in 0{n) time. We use this data structure to merge the odd and even arrays 
because this data structure is appropriate for constructing suffix arrays fast. 

Now, we describe how to merge the odd array SAo and the even array SAg. 
Merging SAq and SAg consists of two steps and requires an additional array 
C'[l..n/2-k 1]. 

Step 1. We count the number of even suffixes that are larger than the odd suffix 
SAo[i — 1] and smaller than the odd suffix S'Ao[z] for all 2 < i < n/2 and store 
the number in C[i]. We store in C[l] the number of even suffixes smaller than 
5'Ao[l] and in C[n/2 + 1] the number of even suffixes larger than SAo\n/2]. To 
compute C[i], 1 < t < n/2 + 1, we use the backward search as follows. 

1. Generate a data structure supporting the backward search in SAt'. 

2. Generate T" where each symbol T”[i], l<i<n/2, isan encoding of 
(T[2ij, T\2i+\]). (Note that the number of even suffixes larger than S'y4o[z— 1] 
and smaller than S'Ao[i] is the same as the number of suffixes of T" larger 
than SAT'[i — 1] and smaller than SAT'[i].) 

3. Initialize all the entries of array C'[l..n/2 + 1] to zero. 

4. Perform the backward search of T" in SAt'~. During the backward search, 
we increment C\i] if a suffix of T” is larger than SAt' \i — 1] and smaller 
than SAt'Ii]- 

Step 2. We store the suffixes in SAo and SAg into SAt using array C. Let pi, 
\ < i < n/2 -I- 1, denote prefix sum C[l] -I- • • • J- C[i\. We should store the odd 
suffix SAo[i] into SAxli + Pi] because pi even suffixes are smaller than SAo[i\. 
We should store the even suffixes SAg[pi + l..pi+i] into SAxli + Pi + l..i + pi+\] 
because i odd suffixes are smaller than the even suffix SAg[j], Pi + l < j < Pi+i- 
To store the odd and even suffixes into SAt in 0{n) time, we do as follows: We 
store the suffixes into SAt from the smallest to the largest. We first store G[l] 
smallest even suffixes into SAt and then we store the odd suffix S'^o[l] into 
SAt- Then, we store next C'[2] smallest even suffixes then store the odd suffix 
5 'Aq[ 2]. We repeat this procedure until all the odd and even suffixes are stored in 
SAt- Gonsider the example in Fig 2. Since G[l] = 1, we store the smallest even 
suffix S'^e[l] into S'^t[ 1] and store the odd suffix S'Ao[l] into SAt[2]. Since 
C[2] = 0, we store no even suffixes and then store the odd suffix S'Aq[ 2] into 
S'A7’[3]. 

We consider the time complexity of this merging algorithm. Since the back- 
ward search in step 1 takes 0(n log IN'']) time where S' is the set of the alphabet 
of T' and the other parts of the merging step take 0{n) time, we get the following 
lemma. 



Lemma 1. This merging algorithm takes 0(nlog |T''|) time. 
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2.3 Time Complexity 

We consider the time complexity of our algorithm. Since the odd-even scheme 
takes 0(n) time except the merging [14], we only consider the time required for 
merging in all recursive calls. We first compute the time required for merging in 
each recursive call. Let Ti denote the text and Ei denote the set of the alphabet 
in the ith recursive call. We generalize Lemma 1 as follows. 

Corollary 1. The merging step in the ith recursive call takes 0{\Ti\ log |L'i+i|) 
time. 

Now, we compute [T^j and the upper bound of \Ei\. Since the length of text 
in the ith recursive call is the half length of text in the (i — l)st recursive call, 

\n=n/T-\ (1) 

Since the size of the alphabet in the ith call is at most the square of the alphabet 
size in the (i — l)st call, 

|r,| < (2) 

Since \Ei\ is cannot be larger than [T^j and by equation (1), 

|Lf^|<r^/2*-^ (3) 

We first compute the time required for merging in the first log log n recursive 
calls and then the time required in the other recursive calls. 

Lemma 2. The merging steps in the first log log n recursive calls take 
0(n log log n) time. 

Proof. We first show that the merging step in each recursive call takes 0{n) time. 
The time required for the merging step in the ith, 1 < i < log log n, recursive 
call is 0(|Ti| log jL'i+il) by Corollary 1. Since \Ti\ = and |L'i+i| < \E\^' 

by equations 1 and 2, 0{\Ti\ log |27i+i|) = ■ log = 0{n ■ log [271). 

Since we only consider the case that |27| is fixed, it is 0(n). Thus, the total time 
required for merging in the first log log n recursive calls is O(nloglogn). 

Lemma 3. The merging steps in all the ith, i > log log n, recursive calls take 
0{n) time. 

Proof. In the ith, i > log log n, recursive call, it takes 0(n/2*“^ • log(n/2*)) 
time by Corollary 1 and equations 1 and 3. If we replace i by log log n + j for 
j > 0, 0(n/T‘~^ ■ log(n/2*)) = 0(nlif2?~^ ■ logn) • log(n/ 2 ^°s'°s"+f))^ Since 
logn > log(n/2i°s'°s”+J), ■ logn) • log(n/2'°s'°s”+J)) = 0{n/2^-^). 

Thus, the total time required for the merging steps in all the ith, i > log logn, 
recursive calls is 0{n + n/2 -|- n/4 H ) = 0{n). 

By Lemma 2 and 3, we get the following theorem. 

Theorem 1. Our construction algorithm runs in 0(n log logn) time. 
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3 Experimental Results 

We measure the running time of our construction algorithm and compare it with 
those of previous algorithms due to Manber and Myers’ (MM), Ko and Alum’s 
(KA), and Karkkainen and Sanders’ (KS). 

We made experiments on both random strings and DNA sequences. We gen- 
erated different kinds of random strings which are differ in lengths (1 million, 
5 million, 10 million, 30 million, and 50 million) and in the sizes of alphabets 
(2, 4, 64, and 128) from which they are drawn. For each pair of text length 
and alphabet size, we generated 100 random strings and made experiments on 
them. We also selected six DNA sequences of lengths 3.2M, 3.6M, 4,7M, 12. 2M, 
16. 9M, and 31. OM, respectively. The data obtained from experiments on random 
strings are given in Table 1 and those on DNA sequences are given in Table 2. 
We measured the running time in mili-second on the 2.8Ghz Pentium VI with 
2GB main memory. 



Table 1. The data obtained from experiments on random strings. We used the C lan- 
guage to implement the algorithms. The code implementing algorithm MM is received 
from Myers and the code implementing algorithm KS is obtained from Sander’s home- 
page (http://www.mpi-sb.mpg.de/~sanders/ programs/suf f ix/). The codes for al- 
gorithm KA and our algorithm were implemented by us. We tried to implement algo- 
rithm KA running as fast as possible. 



Algorithm 

Length(n) 


MM 


KA 


KS 


Ours 


MM 


KA 


KS 


Ours 




1 \S\= 2 


1 1^1=4 


IM 


11,964 


1,808 


2,045 


1,717 


13,145 


2,350 


3,347 


1,681 


5M 


70,916 


10,983 


12,417 


8,736 


75,897 


12,058 


15,033 


9,419 


lOM 


150,923 


22,555 


25,314 


18,475 


106,492 


24,780 


38,413 


19,630 


SOM 


573,659 


72,010 


79,789 


59,824 


576,364 


79,205 


94,888 


62,664 


50M 


1,007,094 


N/A 


137,908 


103,717 


1,106,789 


N/A 


162,591 


108,419 




1 1^1 = 64 


1 1^1 = 128 


IM 


1,439 


2,933 


1,850 


2,149 


2,305 


3,183 


1,950 


2,241 


5M 


83,683 


19,403 


20,200 


13,636 


13,853 


15,680 


10,411 


12,367 


lOM 


175,174 


38,503 


40,416 


26,413 


25,347 


42,677 


42,845 


29,347 


SOM 


595,766 


118,114 


126,206 


81,706 


86,215 


127,984 


134,980 


86,227 


50M 


1,094,356 


N/A 


211,792 


141,276 


158,491 


N/A 


226,425 


152,249 



Table 1 shows that our algorithm runs faster than the other algorithms we’ve 
tested in most cases of random strings. Our algorithm is slower than the other 
algorithms when |A| = 128 and n < lOM and when |A| = 64 and n = IM, i.e., 
when the size of the alphabet is large and the length of text is rather small. 
Thus, our algorithm runs faster than the other algorithms when the size of the 
alphabet is small or the length of text is large. This implies that our algorithm 
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Table 2. The data obtained from experiments on six DNA sequences 



DNA 


mito.nt 


vector 


ecoli.nt 


yeast.nt 


month. est 


month. est 


string 










human 


mouse 


(length) 


(3.2M) 


(3.6M) 


(4.7M) 


(12. 2M) 


(16.9M) 


(31.0M) 


KA 


8,234 


7,781 


11,750 


32,609 


41,094 


87,656 


KS 


9,531 


10,062 


14,640 


40,734 


53,125 


101,756 


Ours 


5,500 


6,828 


8,453 


23,875 


32,969 


66,031 



DMA sequence (I El =4) 




mito.nt vector ecoli.nt yeast.nt month. est month, est 

(3.2M) (3.6M) (4.7M) 02, 2M) human mouse 

(16. 9M) (31.0M) 



DNA (size) 



Fig. 3. A graphic representation of the data in Table 2. 



is appropriate for constructing the suffix arrays of DNA sequences. Table 2 and 
Fig. 3 show the results of experiments made on DNA sequences. On average, our 
algorithm requires less time than algorithm KS by 38% and algorithm KA by 
25%. 

We consider the space required by the algorithms. Our algorithm, algorithm 
KA, and algorithm KS require 0{n) space asymptotically. To compare the hidden 
constants in asymptotic notations, we estimate the constant of each algorithm 
by increasing the length of text until the algorithm uses the virtual memory in 
the secondary storage. Our algorithm and algorithm KS start to use the virtual 
memory when n is about TOM and algorithm KA start to use when n is about 
40M. (Thus, we got no data for algorithm KA when n = 50M in Table 1.) Hence, 
the space required by our algorithm is almost equal to or even less than those 
required by previous fast algorithms. This implies we do not sacrifice the space 
for the running time. 
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4 Discussion 

We explain why our algorithm runs so fast. First, the odd-even scheme on which 
our algorithm is based is quite efficient. We show this by comparing the odd- 
even scheme with the skew scheme. Second, the part of our algorithm whose 
time complexity is 0(n log log n), i.e., the backward search in the merging step, 
does not dominate the total running time of our algorithm. 

4.1 Odd-Even Scheme vs. Skew Scheme 

We compare the odd-even scheme with the skew scheme. The two essential dif- 
ferences of the odd-even scheme and the skew scheme are that the skew scheme 
encodes T into T' of length 2n/3 and that each symbol of T' is an encoding of 
three symbols in T. These two differences make the skew scheme be slower than 
the odd-even scheme. The major two reasons are as follows. 

— The sum of the lengths of the strings to be encoded in all recursive calls 

in the skew scheme is 1.5 times longer than that in the odd-even scheme. 
In the skew scheme, the size of the text is reduced to 2/3 of it per each 
recursive call and thus the sum of the lengths of the encoded strings is 
in {= n + 2n/3 -I- 4n/9 -!-•••)) while in the odd-even scheme, it is 2n (= 
n + n/2 -I- n/4 H ). 

— The skew scheme performs 3-round radix sort to sort triples while the odd- 
even scheme performs 2-round radix sort to sort pairs. 

With these two things combined, it is expected that the encoding in the odd- 
even scheme is quite faster than that in the skew scheme. We implemented the 
encoding step in the odd-even scheme and the encoding step in the skew scheme 
and measured the running time of them. We summarized the results in Table 3. 



Table 3. Comparion of the encoding time. The first table is the result when the size 
of the alphabet is 4 and the second table is the result when the size of the alphabet is 
128. 





1^1=4 1 


n 


IM 


5M 


lOM 


30M 


SOM 


odd-even 


1,078 


4,110 


8,719 


28,703 


47,202 


skew 


1,860 


10,672 


21,233 


64,956 


110,438 





00 

(N 

II 


n 


IM 


5M 


lOM 


30M 


SOM 


odd-even 


1,406 


6,249 


16,390 


45,034 


79,563 


skew 


1,608 


7,985 


35,094 


107,578 


179,326 



Table 3 shows that the encoding in the odd-even scheme is about 2 times 
faster than that in the skew scheme. We carefully implemented the two schemes 
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such that anything that is not inherent to the schemes such as code tuning, 
cannot affect the running time. We attached the C codes used to implement 
these two schemes in the appendix. (The code for encoding in the skew scheme 
is the same as that presented by Karkkainen and Sanders [15].) 

4.2 The Ratio of 0(n log log n)-Time Backward Search 

We compute the ratio of the running time of the backward search which runs 
in 0(n log log n) time to the total running time of our algorithm. Table 4 shows 
that the backward search consumes 34% of the total running time when 1111 = 4 
and 27% when jT'j = 128. Hence, the running time of the backward search is not 
crucial to the total running time. 



Table 4. The ratio of the running time of the backward search to the total running 
time. The first table is the result when the size of the alphabet is 4 and the second 
table is the result when the size of the alphabet is 128. 





1^1=4 


n 


IM 


5M 


lOM 


30M 


50M 


backward search 


829 


4,999 


5,874 


20,157 


35,390 


total 


2,376 


11,859 


19,220 


63,767 


106,984 


rate 


34.89% 


42.15% 


30.56% 


31.61% 


33.08% 





00 

(N 

II 


n 


IM 


5M 


lOM 


30M 


50M 


backward search 


1,078 


5,876 


5,501 


17,687 


31,283 


total 


3,126 


14,438 


29,892 


85,953 


154,469 


rate 


34.48% 


40.70% 


18, 40% 


20.58% 


20.25% 



5 Concluding Remarks 

We presented a fast algorithm for constructing suffix arrays for the fixed-size 
alphabet. Our algorithm constructs suffix arrays faster than any other algorithms 
developed for integer or general alphabets when the alphabet is fixed-size. Our 
algorithm runs 1.33 - 1.6 times faster than the previous fast algorithms without 
sacrificing the space in constructing suffix arrays of DNA sequences. 

In this paper, we considered the suffix array of T as the lexicographically 
sorted list of the suffixes of T. Sometimes, the suffix array is defined as a pair of 
two arrays, which are the sorted array and the Icp array. The sorted array stores 
the sorted list of the suffixes and the Icp array stores the lcp(longest common 
prefix) ’s of the suffixes. If the Icp array is needed, it can be computed from the 
sorted array in 0{n) time due to Kasai et al. [12]. 

Besides, there’s another line of research related to reducing the space of suffix 
arrays. They use only 0(n)-bit space representing a suffix array. Since we only 
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focused on fast construction of suffix arrays in this paper, we used 0(n log n)-bit 

space. It will be interesting to develop a practically fast construction algorithm 

that uses only 0(n)-bit space. 
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Appendix 



Radix Sort (shared by both schemes) 

static void radixPass (long* a, long* b, long* r, long n, long K) 

{ // count occurrences 

long sum; 

long* c = new long [K + 1] ; // counter array 

for (long i=0; i <= K; i++) c [i] =0; // reset counters 
for (i=0; i < n; i++) c [r [a[i] ] ] ++; // count occurrences 

for (i=0, sum=0; i <= K; i++) { // exclusive prefix sums 

long t = c[i]; c [i] = sum; sum += t; } 
for (i=0; i < n; i++) // sort 

b[c[r[a[i]]]++] = a[i]; 
delete [] c; 

} 

Codes for encoding in the skew scheme 

// Isb radix sort the mod 1 and mod 2 triples 
radixPass(sl2 , SA12, s+2, n02, K) ; 
radixPass (SA12, sl2 , s+1, n02, K) ; 
radixPass(sl2 , SA12, s , n02, K) ; 

// find lexicographic names of triples 
int name = 0; 

int cO = -1, cl = -1, c2 = -1; 
for (i = 0; i < n02; i++) 

{ if (s[SA12[i]] !=c0 II s [SA12 [i] +1] ! =cl II s [SA12 [i] +2] ! =c2) { 
name++; c0=s [SA12 [i] ] ; cl=s [SA12 [i] +1] ; c2=s [SA12 [i] +2] ; 

} 

if (SA12[i] 7. 3 == 1) 

sl2 [SA12 [i] /3] = name; // left half 

else sl2 [SA12 [i] /3 + nO] = name; // right half 

} 

Codes for encoding in the odd-even scheme 

// radix sort 

radixPass(SA, encodedText, s+1, n+1, K) ; 
radixPass (encodedText, SA, s, n+1, K) ; 
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// find lexicographic names of couples 
for name =0, cO = -1, cl = -1; 
for (i = 0; i <= n; i++) 

{ if (SA[i] 7o 2 != 0) continue; 

if (s[SA[i]] != cO II s[SA[i]+l] != cl) { 
name++; cO = s[SA[i]]; cl = s[SA[i]+l]; 

} 

textEven [SA [i] /2] = namie; 

} // We construct the even array recursively rather thcui the odd 
array 

// because 0 is the starting index of an array in the C language. 
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Abstract. We are interested in the graph coloring problem. We stud- 
ied the effectiveness of some pre-processings that are specific to the k- 
colorability problem and that promise to reduce the size or the difficulty 
of the instances. We propose to apply on the reduced graph an exact 
method based on a linear-decomposition of the graph. We present some 
experiments performed on literature instances, among which DIMACS 
library instances. 



1 Introduction 

The Graph Coloring Problem constitutes a central problem in a lot of applica- 
tions such as school timetabling, scheduling, or frequency assignment [5,6]. This 
problem belongs to the class of NP-hard problems [10]. Various heuristics ap- 
proaches have been proposed to solve it (see for instance [2,8,9,11,13,17,19,21]). 
Efficient exact methods are less numerous: implicit enumeration strategies [14, 
20,22], column generation and linear programming [18], branch-and-bound [3], 
branch-and-cut [7], without forgetting the well-known exact version of Brelaz’s 
DSATUR [2]. 

A coloring of a graph G = {V, E) is an assignment of a color c{x) to each 
vertex such that c{x) yf c{y) for all edges {x,y) G E. If the number of colors 
used is k, the coloring of G is called a k-coloring. The minimum value of k for 
which a k-coloring is possible is called the chromatic number of G and is denoted 
x(G). The graph coloring problem consists in finding the chromatic number of 
a graph. Our approach to solve this problem is to solve for different values of k 
the k- color ability problem: “does there exist a k-coloring of G ?” . 

We propose to experiment the effectiveness of some pre-processings that are 
directly related to the k-colorability problem. The aim of these processings is to 
reduce the size of the graph by deleting vertices and to constrain it by adding 
edges. Then we apply a linear-decomposition algorithm on the reduced graph 
in order to solve the graph coloring problem. This method is strongly related 
to notions of tree-decomposition and path-decomposition, well studied by Bod- 
laender [1]. Linear-decomposition has been implemented efficiently by Carlier, 

* With the support of Conseil Regional de Picardie and FSE. 
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Lucet and Manouvrier to solve various NP-hard problems [4,15,16] and has for 
main advantage that the exponential factor of its complexity depends on the 
linearwidth of the graph but not on its size. 

Our paper is organized as follows. We present in Sect. 2 some pre-proces- 
sings related to the k-colorability problem and test their effectiveness on various 
benchmark instances. In Sect. 3, we describe our linear-decomposition algorithm. 
We report the results of our experiments in Sect. 4. Finally, we conclude and 
discuss about the perspectives of this work. 

2 Pre-processings 

In this section, we present several pre-processings to reduce the difficulty of a k- 
colorability problem. These pre-processings are iterated until the graph remains 
unchanged or the whole graph is reduced. 

2.1 Definitions 

An undirected graph G is a pair (V, E) made up of a vertex set V and an edge 
set E C V X V. Let N = \V\ and M = \E\. A graph G is connected if for 
all vertices w,v € V{w ^ v), there exists a path from w to v. Without loss of 
generality, the graphs we will consider in the following of this paper will be only 
undirected and connected graphs. Given a graph G = {V, E) and a vertex x G V, 
let d{x) = {y & V/{x,y) G E}. ^{x) represents the neighborhood of x in G. The 
subgraph of G = (P, E) induced by I CV, is the graph G(/) = (/, Ej) such that 
Ej = E n {I X I). A clique of G = (V,E) is a subset G C V such that every 
two vertices in G are joined by an edge in E. Let E = {V x V) \ E he the set 
made up of all pairs of vertices that are not neighbors in G = (V,E). Let d be 
the degree of G, i.e. the maximal vertex degree among all vertices of G. 

2.2 Reduction 1 

A vertex reduction using the following property of the neighborhood of the 
vertices can be applied to the representative graph before any other computation 
with time complexity 0{\E\ * d), upper bounded by O(iV^). Given a graph G, 
for each pair of vertices x,y G V such that (x,y) ^ E, if t?(y) C tt{x) then 
y and its adjacent edges can be erased from the graph. Indeed, suppose that 
k — 1 colors are needed to color the neighbors of x. The vertex x can take the 
color. Vertices x and y are not neighbors. Moreover, the neighbors of y are 
already colored with at most k — 1 colors. So, if G \ {y} is k-colorable then G 
is k-colorable as well and we can delete y from the graph. This principle can be 
applied recursively as long as vertices are removed from the graph. 

2.3 Reduction 2 

Suppose that we are searching for a k-coloring of a graph G = (V, E). Then we 
can use the following property: for each vertex x, if the degree of x is strictly 
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lower than k, x and its edges can be erased from the graph [11]. Assume x 
has k — 1 neighbors. In the worst case, those neighbors must have different 
colors. Then the vertex x can take the color. It does not interfere in the 
coloring of the remaining vertices because all its neighbors have already been 
colored. Therefore we can consider from the beginning that it will take a color 
unused by its neighbors and delete it from the graph before the coloring. The 
time complexity of this reduction is 0{N). We apply this principle recursively 
by examining the remaining vertices until having totally reduced the graph or 
being enable to delete any other vertex. 



2.4 Vertex Fusion 

Suppose that we are searching for a k-coloring of G = (V, E) and that a clique 
C of size k has been previously determined. For each couple of non-adjacent 
vertices x,y G V such that x ^ C and y G C, if x is adjacent to all vertices 
of C \ y then x and y can be merged by the following way: each neighbor of 
X becomes a neighbor of y, then x and its adjacent edges are erased from the 
graph. Indeed, since we are searching for a k-coloring, x and y must have the 
same color. Then Wz G ^{x) c{y) yf c{z) and the edge {y, z) can be added to G. 
Then i9(x) C i9(y) and x can be erased from the graph (cf Sect. 2.2). The time 
complexity of this pre-processing is 0(N * k). 



2.5 Edge Addition 

Suppose that we are searching for a k-coloring of G = (V,E) and that a clique G 
of size k has been previously determined. For each couple of non-adjacent vertices 
X, y G V, if Vz G G we have (x, z) G E or {y, z) G E, then the edge (x, y) can be 
added to the graph. Necessarly, x must take a color from the colors of G \ 'd(x). 
Since d(y) D G \ ’!?(x), c(x) yf c(y). This constraint can be represented by an 
edge between x and y. The time complexity of this pre-processing is 0(|if| * fc), 
upper bounded by 0{N^ * k). 



Algorithm 1 Pre-processings 
Input: a graph G and an integer k 

Output: a graph G' k-colorable if and only if G is k-colorable 
repeat 

reduction 1 
reduction 2 

if 3 at least 1 clique of size k then 

apply vertex fusion and edge addition on G 

end if 

until there is no more change in G 
G' = G 
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2.6 Pre-processing Experiments 

Our algorithms have been implemented on a PC AMD Athlon Xp 2000+ in C 
language. The method used is as follows. To start with, we apply on the entry 
graph G a fast clique search algorithm: as long as the graph is not triangulated, 
we remove a vertex of smallest degree, and then we color the remaining triangu- 
lated graph by determining a perfect elimination order [12] on the vertices of G. 
The size of the clique provided by this algorithm, denoted LB, constitutes a lower 
bound of the chromatic number of G. Then we apply on G the pre-processings 
described in Algorithm 1, supposing that we are searching for a k-coloring of the 
graph with k = LB. We performed tests on benchmark instances used at the 
computational symposium COLOR02, including well-known DIMACS instances 
(see description of the instances at http://mat.gsia.cmu.edu/COLOR02). Re- 
sults are reported in Table 1. For each graph, we indicate the initial number 
of vertices N and the number of edges M. The column LB contains the size 
of the maximal clique found. The percentage of vertices deleted by the pre- 
processings is reported in column Del. The number of remaining vertices after 
the pre-processing step is reported in column new-N . Remark that some of the 
instances are totally reduced by the pre-processings when k = LB, and that 
some of them are not reduced at all. 

Table 1. Pre-processings results 



Graph 


N 


M 


LB 


new_N 


Del 


Graph 


N 


M 


LB 


new_N 


Del 


1-Fulllns3 


30 


100 


3 


15 


50% 


1-Fulllns4 


93 


593 


3 


35 


62% 


1-Fulllns5 


282 


3247 


3 


75 


73% 


2-FullIns3 


52 


201 


4 


9 


81% 


2-FullIns4 


212 


1621 


4 


41 


81% 


2-FullIns5 


852 


12201 


4 


89 


90% 


3-FullIns3 


80 


346 


5 


11 


86% 


3-FullIns4 


405 


3524 


2 


51 


87% 


3-FullIns5 


2030 


33751 


2 


107 


95% 


4-FullIns3 


114 


541 


6 


13 


89% 


4-FullIns4 


690 


6650 


2 


58 


92% 


5-FullIns3 


154 


792 


7 


15 


90% 


5-FullIns4 


1085 


11395 


2 


65 


94% 


fpsol2.i.l 


496 


11654 


65 


228 


54% 


fpsol2.i.2 


451 


8691 


30 


175 


61% 


fpsol2.i.3 


425 


8688 


30 


149 


65% 


inithx.i.l 


864 


18707 


54 


443 


49% 


inithx.i.2 


645 


13979 


31 


215 


67% 


inithx.i.3 


621 


13969 


31 


190 


69% 


mulsol.i.l 


197 


3925 


49 


60 


70% 


mulsol.i.2 


188 


3885 


31 


88 


53% 


mulsol.i.3 


184 


3916 


31 


83 


55% 


mulsol.i.4 


185 


3946 


31 


85 


54% 


mulsol.i.5 


186 


3973 


31 


84 


55% 


schooll 


385 


19095 


14 


360 


6% 


school l_nsh 


352 


14612 


14 


331 


6% 


3-Inser_3 


56 


110 


2 


56 


0% 


4-Inser_3 


79 


156 


2 


79 


0% 


le450_25a 


450 


8260 


20 


297 


34% 


le450_25b 


450 


8263 


25 


294 


35% 


anna 


138 


493 


11 


0 


100% 


david 


87 


812 


11 


0 


100% 


homer 


561 


1629 


13 


0 


100% 


jean 


80 


508 


10 


0 


100% 


muglOO-1 


100 


166 


3 


100 


0% 


muglOO-25 


100 


166 


3 


100 


0% 


mug88-l 


88 


146 


3 


88 


0% 


mug88-25 


88 


146 


3 


88 


0% 


miles250 


128 


387 


7 


34 


73% 


miles500 


128 


2340 


20 


0 


100% 


miles 750 


128 


4226 


31 


0 


100% 


mileslOOO 


128 


6432 


42 


0 


100% 


milesl500 


128 


10396 


73 


0 


100% 


DSJR500A 


500 


3555 


12 


28 


94% 


zeroin.i.l 


211 


4100 


49 


86 


59% 


zeroin.i.2 


211 


3541 


30 


55 


74% 


zeroin.i.3 


206 


3540 


30 


50 


76% 


gamesl20 


120 


638 


9 


0 


100% 
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3 Linear-Decomposition Applied to the k-Colorability 
Problem 

In this section, we propose a method which uses linear-decomposition mixed 
with Dsatur heuristic in order to solve the k-colorability problem. 

3.1 Definitions 

We will consider a graph G = {V,E). Let = |P| and M = \E\. A vertex linear 
ordering of G is a bijection M V ^ {1, , N}. For more clarity, we denote i the 
vertex Af~^{i). Let Vi be subset of V made of the vertices numbered from 1 to i. 
Let Hi = {Vi, Ei) be the subgraph of G induced by Vi. Let Ei = {j G V/3{j, 1) G 
E j < i < ^} Vt G {1, . . . , |P|}. Fi is the boundary set of Hi. Let H{ = {V(, E{) 
be the subgraph of G such that V( = {V \ Vi)\J Ei and E[ = EC\{Vl x V(). The 
boundary set Fi corresponds to the set of vertices joining Hi to H[ (see Fig. 1). 

The linearwidth of a vertex linear ordering M is Fmax{J^) = maxi^ v(l^d)- 
We use a vertex linear ordering of the graph to resolve the k-colorability problem 
with a linear-decomposition. The resolution method is based on a sequential 
insertion of the vertices, using a vertex linear ordering previously determined. 
This will be developed in the following section. 





Fig. 1. A subgraph Hio of G and its boundary set Fio = {7, 8, 10} 
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3.2 Linear-Decomposition Algorithm 

The details of the implementation of the linear-decomposition method are re- 
ported in Algorithm 2. The vertices of G are numbered according to a linear 
ordering Af : V ^ {1, .. . ,N}. Then, during the coloring, we will consider N 
subgraphs Hi,. . . , Hm and the N corresponding boundary sets Fi, . . . , Fm, as 
defined in Sect. 3.1. 



Algorithm 2 k-colorability 

Input: a graph G and an integer k 

Output: Result : True if and only if G is k-coloriable 

Fi = {1} 

G{Hi,l) = [l] 
i = 2 

Result = True 
while i < N and Result do 
Result — False 
Build Hi and Fi 

for each configuration G{Hi-i, x) of Fi-i do 
for j — 1 to number of blocks of G{Hi-i,x) do 

if i does not have any neighbor in the block j then 
Result = True 
part = G{Hi-i, x) 
insert i in the block j of part 

generate the configuration G{Hi,y) corresponding to part 
val{C{Hi, y)) = min(val{G{Hi, y)), val{G{Hi-\,x))) 

end if 
end for 

if number of blocks of G{Hi-i,x) < k then 
Result = True 
part = G{Hi-i,x) 

add to part a new block containing i 

val(part) = max(val(G(Hi-i, x)), number of blocks of part) 
generate the configuration G{Hi,y) corresponding to part 
val{G{Hi,y)) = min{val{G{Hi,y)),val{part)) 

end if 
end for 
i = i + 1 
end while 



The complexity of the linear-decomposition is exponential with respect to 
FmaxiJ^), SO it is neccssary to make a good choice when numbering the vertices 
of the graph. Unfortunately, finding an optimal vertex linear ordering in order 
to obtain the smallest linearwidth is a NP-complete problem [1]. After some 
experiments on various heuristics of vertex numbering, we choose to begin the 
numbering from the biggest clique provided by our clique search heuristic (cf 
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Sect. 2.6). Then we order the vertices by decreasing number of already numbered 
neighbors. 

Starting from a vertex linear ordering, we build at first iteration a subgraph 
Hi which contains only the vertex 1, then at each step the next vertex and its 
corresponding edges are added, until iL^r. To each subgraph Hi corresponds a 
boundary set T) containing the vertices of Hi which have at least one neighbor in 
H[. The boundary set Fi is built from Ti_i by adding the vertex i and removing 
the vertices whose neighbors have all been numbered with at most i. Several 
colorings of Hi may correspond to the same coloring of Fi. Moreover, the colors 
used by the vertices Vi\Fi do not interfere with the coloring of the vertices which 
have an ordering number greater than i, since no edge exists between them. So, 
only the partial solutions corresponding to different colorings of Fi have to be 
stored in memory. This way, several partial solutions on Hi may be summarized 
by a unique partial solution on Fi, called configuration of Fi. 

A configuration of the boundary set Fi is a given coloring of the vertices of 
Fi. This can be represented by a partition of Fi, denoted Bi, ... ,Bj, such that 
two vertices u, v of Fi are in the same block Be if they have the same color. 
The number of configurations of Fi depends obviously on the number of edges 
between the vertices of T). The minimum number of configurations is 1. If the 
vertices of Fi form a clique, only one configuration is possible: Bi, . . . , B\p.\, 
with exactly one vertex in each block. The maximal number of configurations 
of Fi equals the number of possible partitions of a set with |Jfi| elements. When 
no edge exists between the boundary set vertices, all the partitions are to be 
considered. Their number T{Fi) grows exponentially according to the size of 
Fi. Their ordering number x, included between 1 and T{Fi), is computed by an 
algorithm according to their number of blocks and their number of elements. This 
algorithm uses the recursive principle of Stirling numbers of the second kind. 
The partitions of sets with at most four elements and their ordering number 
are reported in Table 2. Let C{Hi,x) be the x*^ configuration of Fi for the 
subgraph Hi. Its value, denoted val{C{Hi,x)) equals the minimum number of 
colors necessary to color Hi for this configuration. 

At step i, fortunately we do not examine all the possible configurations of the 
step i — 1, but only those which have been created at precedent step, it means 
those for which there is no edge between two vertices of the same block. For 
each configuration of Fi-i, we introduce the vertex i in each block successively. 
Each time the introduction is possible without breaking the coloring rules, the 
corresponding configuration of Fi is generated. Moreover, for each configuration 
of Fi_i with value strictly lower than A: — 1, we generate also the configuration 
obtained by adding a new block containing the vertex i. 

In order to improve the linear-decomposition, we apply the Dsatur heuristic 
evenly on the remaining graph iL', for different configurations of F). If Dsatur 
finds a k-coloring then the process ends and the result of the k-coloring is yes. 
Otherwise the linear-decomposition continues until a configuration is generated 
at step N, in this case the graph is k-colorable, or no configuration can be 
generated from the precedent step, in this case the graph is not k-colorable. The 
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Table 2. Classification of the partitions of sets containing from 1 to 4 elements 





j = l 


j=2 


j=3 


j=4 


m 


i=l 


1 [1] 








T(l) = 1 


i=2 


1 [12] 


2 [1][2] 






T(2) = 2 


i=3 


1 [123] 


2 [13] [2] 

3 [1][23] 

4 [12] [3] 


5 [1][2][3] 




T(3) = 5 


i=4 


1 [1234] 


2 [134] [2] 

3 [13] [24] 

4 [14] [23] 

5 [1][234] 

6 [124] [3] 

7 [12] [34] 

8 [123] [4] 


9 [14][2]]3] 

10 ]1][24]]3] 

11 ]1][2][34] 

12 ]13][2]]4] 

13 ]1][23]]4] 

14 ]12][3]]4] 


15 [1][2][3][4] 


T(4) = 15 



complexity of the linear-decomposition algorithm, upper bounded by 
is exponential according to the linearwidth of the graph, but linear according to 
its number of vertices. 



3.3 Example of Configuration Computing 

Assume that we are searching for a 3-coloring of the graph G of Fig. 2. Sup- 
pose that at step i — 1 we had = {u, w}. The configurations of Fi_i were 
1) = [rtri] of value a and C{Hi_i,2) = of value (3. The value of f3 
is 2 or 3, since the corresponding configuration has 2 blocks and fc = 3. 




Suppose that at step i, vertex u is deleted from the boundary set (we sup- 
pose that it has no neighbor in i/'), so Fi = We want to generate the 

configurations of Fi from the configurations of Fi_i. The insertion of i in the 
unique block of C(iJi_i, 1) is impossible, since u and i are neighbors. It is possi- 
ble to add a new block, it provides the partition [uv][i] of 2 blocks, corresponding 
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to the configuration C{Hi,2) = with val{C{Hi,2)) = max{a,2). Vertex i 
can be introduced in the second block of C{Hi_i,2). It provides the partition 
[u][vi] corresponding to the configuration C{Hi, 1) = [ut] with value j3. It is also 
possible to add a new block to C(iLi_i,2), it provides the partition [u][v][i] of 
3 blocks corresponding to the configuration C{Hi, 2) = [u][z]. This configuration 
already exists, so val{C{Hi, 2)) = min{val{C{Hi, 2)),max{j3, 3)). Thus two con- 
figurations are provided at step i, they are used to determine the configurations 
of the following step, and so on until the whole graph is colored. 



4 k-Colorability Experiments 

We performed experiments on the reduced instances of Table 1. Obviously, we 
did not test instances that were already solved by pre-processings. Results of 
these experiments are reported in Table 3. For each instance, we tested succes- 
sive k-colorings, k starting from LB and increasing by step 1 until a coloring 
exists. We report the result and computing time of our linear-decomposition al- 
gorithm kColor, for one or two relevant values of k. We give also in column Fmax 
the linearwidth of the vertex linear ordering chosen Fmax (Af) . Most of these in- 
stances are easily solved. Configurations generated by instance 2-FullIns5 for a 
6-coloring exceeded the memory capacity of our computer, so we give for this 
instance the results for a 5-coloring and for a 7-coloring. Instances 2-FullIns4, 3- 
Fulllnsd, 4-FullIns4, 5-FullIns4 and 4-Inser3 are solved exactly, whereas no exact 
method had been able to solve them at the COLOR02 computational symposium 
(see http://mat.gsia.cmu.edu/COLOR02/ summary.htm for all results). 



Table 3. k-colorability results 



Problem 


^ max 


k 


kColor 


Time 


k 


kColor 


Time 


1-Fulllns3 


17 


3 


no 


0.00 


4 


yes 


0.00 


1-Fulllns4 


46 


4 


no 


0.02 


5 


yes 


0.02 


1-Fulllns5 


132 


5 


no 


436.17 


6 


yes 


0.05 


2-FullIns3 


22 


4 


no 


0.00 


5 


yes 


0.00 


2-FullIns4 


93 


5 


no 


0.02 


6 


yes 


0.02 


2-FullIns5 


359 


5 


no 


29.50 


7 


yes 


0.15 


3-FullIns3 


36 


5 


no 


0.00 


6 


yes 


0.00 


3-FullIns4 


200 


6 


no 


0.03 


7 


yes 


0.03 


4-FullIns3 


49 


6 


no 


0.00 


7 


yes 


0.00 


4-FullIns4 


302 


7 


no 


0.08 


8 


yes 


0.13 


5-FullIns3 


64 


7 


no 


0.00 


8 


yes 


0.00 


5-FullIns4 


447 


8 


no 


0.22 


9 


yes 


0.20 


3-Inser3 


16 


3 


no 


29.27 


4 


yes 


0.00 


4-Inser3 


20 


3 


no 


1772.95 


4 


yes 


0.00 


mug88-l 


8 


3 


no 


0.00 


4 


yes 


0.00 


mug88-25 


8 


3 


no 


0.00 


4 


yes 


0.00 


continued on next page 
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continued from previous page 


Problem 


^ max 


k 


kColor 


Time 


k 


kColor 


Time 


mug 100-1 


7 


3 


no 


0.02 


4 


yes 


0.00 


muglOO-25 


8 


3 


no 


0.00 


4 


yes 


0.00 


miles250 


16 


7 


no 


0.00 


8 


yes 


0.00 


le450-25a 


293 


24 


no 


0.00 


25 


yes 


0.98 


le450-25b 


303 


25 


yes 


0.00 








fpsol2.i.l 


82 


65 


yes 


0.00 








fpsol2.i.2 


50 


30 


yes 


0.00 








fpsol2.i.3 


50 


30 


yes 


0.00 








inithx.i.l 


69 


54 


yes 


0.00 








inithx.i.2 


42 


31 


yes 


0.00 








inithx.i.3 


42 


31 


yes 


0.00 








mulsol.i.l 


61 


49 


yes 


0.00 








mulsol.i.2 


45 


31 


yes 


0.00 








mulsol.i.3 


46 


31 


yes 


0.00 








mulsol.i.4 


45 


31 


yes 


0.00 








mulsol.i.5 


47 


31 


yes 


0.00 








school 1 


291 


14 


yes 


0.00 








school l_nsh 


258 


14 


yes 


0.00 








zeroin.i.l 


62 


49 


yes 


0.00 








zeroin.i.2 


45 


30 


yes 


0.00 








zeroin.i.3 


45 


30 


yes 


0.00 








DSJR500_1 


68 


12 


yes 


0.02 









5 Conclusions 

In this paper, we have presented some pre-processings that are effective to re- 
duce the size of some of difficult coloring instances. We presented also an original 
method to solve the graph coloring problem by an exact way. This method has 
the advantage of solving easily large instances which have a bounded linearwidth. 
The computational results obtained on literature instances are very satisfactory. 
We consider using the linear-decomposition mixed with heuristics approach to 
deal with unbounded linearwidth instances. We are also looking for more reduc- 
tion techniques to reduce the size or the difficulty of these instances. 
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Abstract. We describe in this extended abstract the experiments that 
we have condncted in order to fullfil the following goals: a) to obtain 
a working implementation of the unranking algorithms that we have 
presented in previous works; b) to assess the validity and range of appli- 
cation of our theoretical analysis of the performance of these algorithms; 
c) to provide preliminary hgures on the practical performance of these 
algorithms under a reasonable environment; and finally, d) to compare 
these algorithms with the algorithms for random generation. Addition- 
ally, the experiments support our conjecture that the average complex- 
ity of boustrophedonic unranking is ©(nlogn) for many combinatorial 
classes (namely, those whose specihcation requires recursion) and that it 
performs only slightly worse than lexicographic unranking for iterative 
classes (those which do not require recursion to be specified). 



1 Introduction 

The problem of unranking asks for the generation of the ith combinatorial object 
of size n in some combinatorial class A, according to some well defined order 
among the objects of size n of the class. Efficient unranking algorithms have been 
devised for many different combinatorial classes, like binary and Cayley trees, 
Dyck paths, permutations, strings or integer partitions, but most of the work in 
this area concentrates in efficient algorithms for particular classes, whereas we 
aim at generic algorithms that apply to a broad family of combinatorial classes. 
The problem of unranking is intimately related with its converse, the ranking 
problem, as well as with the problems of random generation and exhaustive 
generation of all combinatorial objects of a given size. The interest of this whole 
subject is witnessed by the vast number of research papers and books that has 
appeared in over four decades (see for instance [1,2,7,8,9,13,14,15]). 

In [10,11] we have designed generic unranking algorithms for a large family of 
combinatorial classes, namely, those which can be inductively built from the basic 
e-class (a class which contains only one object of size 0), atomic classes (classes 

* This research was supported by the Future and Emergent Technologies programme 
of the EU under contract IST-1999-14186 (ALCOM-FT) and the Spanish “Ministerio 
de Ciencia y Tecnologfa” programme TIC2002-00190 (AEDRI II). 



C.C. Ribeiro and S.L. Martins (Eds.): WEA 2004, LNCS 3059, pp. 326—340, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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that contain only one object of size 1 or atom) and a collection of admissible com- 
binatorial operators: disjoint unions, labelled and unlabelled products, sequence, 
set, etc. We say our algorithms are generic in the sense that they receive a finite 
description of the combinatorial class A to which the sought object belongs, be- 
sides the rank i and size n of the object to be generated. Our approach provides 
a considerable flexibility, thus making these algorithms attractive for their inclu- 
sion in general purpose combinatorial libraries such as combstruct for Maple [3] 
and MuPAD-combinat for MuPAD (see mupad-combinat . sourceforge.net), as 
well as for rapid prototyping. 

Besides designing the unranking algorithms, we tackled in [10,11] the analysis 
of their performance. We were able to prove that for classes whose specification 
does not involve recursion (the so-called iterative classes) unranking can be per- 
formed in linear time on the size of the object. On the other hand, for those 
classes whose specification requires the use of recursion — for instance, binary 
trees — the worst case complexity of unranking is C)(n^), whereas the average 
case complexity is typically 0{n^/n). 

The results just mentioned on the average and worst-case complexity of un- 
ranking apply when we generate objects according to the lexicographic order; 
the use of the somewhat “extravagant” boustrophedonic order substantially im- 
proves the complexity, namely to C)(n log n) in the worst-case. Later on we will 
precisely define the lexicographic and boustrophedonic orders. 

Our analysis showed that the performance of unranking algorithms coincides 
with that of the random generation algorithms devised by Flajolet et al. [6], 
where the same generic approach for the random generation of combinatorial ob- 
jects was firstly proposed. Thus if the random generation algorithm has average- 
case (worst-case) complexity 6>(/(n)) to generate objects of size n in some class 
A, so does the unranking algorithm for that class. 

However, the analysis of the performance of the unranking algorithms — the 
same applies to the analysis of random generation algorithms — introduced sev- 
eral simplifications, so two important questions were left unanswered: 

1. Which is the constant factor in the average-case complexity of the unranking 
algorithm for a given class A7 In particular, for objects unranked using the 
boustrophedonic order the analysis didn’t provide even a rough estimation 
of such constant. 

2. How does unranking compare random generation? Both unranking and ran- 
dom generation have complexities with identical order of magnitude, but the 
analysis did not settled how the respective constant factors compare to each 
other. 

In order to provide an answer (even if partial) to these questions we have de- 
veloped working implementations of our unranking algorithms in Maple and 
conducted a series experiments to measure their practical performance. We have 
instrumented our programs to collect data about the number of arithmetical 
operations that they perform. In this context this is the usual choice to measure 
implementation-independent performance, much in the same manner that key 
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comparisons are used to measure the performance of sorting and searching algo- 
rithms. On the other hand, we have also collected data on their actual execution 
times. While these last measurements should have been done in several plat- 
forms and for several different implementations in order to extract well-grounded 
conclusions, they confirm the main trends that the implementation-independent 
measurements showed up when we compared unranking and random generation. 
In spite of the theoretical equivalence of the complexity of unranking and ran- 
dom generation, the comparison on practical terms is interesting if we want to 
produce samples without repetition. We can use the rejection method, generat- 
ing objects of size n at random until we have collected m distinct objects, or we 
can generate m distinct ranks at random in 0(m) time using Floyd’s algorithm 
and then make the corresponding m calls to the unrank function. While the 
second is theoretically better, the answer could be different on practical grounds 
(especially if m is not very big and random generation were more efficient than 
unranking by a large factor) . 

Our experiments also show that the unranking algorithms have acceptable 
running times — there is some penalty that we have to pay for the generality 
and flexibility — in a relatively common and modest platform when n < 800. For 
larger n, we face the problem of storing and manipulating huge counters (the 
number of combinatorial structures typically grows exponentially in n or even 
as n!). 

Last but not least, the experiments have been useful to show that our asymp- 
totic analyses provide good estimations for already small values of n. 

The paper is organized as follows. In Section 2 we briefly review basic def- 
initions and concepts, the unranking algorithms and the theoretical analysis of 
their performance. Then, in sections 3 and 4 we describe the experimental setup 
and the results of our experiments. 

2 Preliminaries 

As it will become apparent, all the unranking algorithms in this paper require 
an efficient algorithm for counting, that is, given a specification of a class and 
a size, they need to compute the number of objects with the given size. Hence, 
we will only deal with so-called admissible combinatorial classes [4,5]. Those are 
constructed from admissible operators, operations over classes that yield new 
classes, and such that the number of objects of a given size in the new class 
can be computed from the number of objects of that size or smaller sizes in 
the constituent classes. In this paper we consider labelled and unlabelled objects 
built from these admissible combinatorial operators. Unlabelled objects are those 
whose atoms are indistinguishable. On the contrary, each of the n atoms of a 
labelled object of size n bears a distinct label drawn from the numbers 1 to n. 

For labelled classes, the finite specifications are built from the e-class (with a 
single object of size 0 and no labels), labelled atomic classes, and the following 
combinatorial operators: union (’-I-’), partitional product (’*’), sequence (’Seq’), 
set (’Set’), cycle (’Cycle’), substitution (’Subst’), and sequence, set and cycle 
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with restricted cardinality. For unlabelled classes, the finite specifications are 
generated from the e-class, atomic classes, and combinatorial operators includ- 
ing union Cartesian product (’x’), sequence (’Seq’), powerset, (’PowerSef), 
set^ (’Set’), substitution (’Subst’), and sequence, set and powerset with restricted 
cardinality. Figure 2 gives a few examples of both labelled and unlabelled ad- 
missible classes. 

For the rest of this paper, we will use calligraphic uppercase letters to denote 
classes: A, B, C, . . . . Given a class A and a size n. An will denote the subset of 
objects of size n in ^ and a„ the number of such objects. Furthermore, given a 
class A we denote TAn the cumulated cost of unranking all the elements of size 
n in A. The average cost of unranking will be then given by = 'BAnlcin, if 
we assume all objects of size n to be equally likely. 

The unranking algorithms themselves are not too difficult, except perhaps 
those for unlabelled powersets, multisets, cycles and their variants. Actually, we 
have not found any efficient algorithm for the unranking of unlabelled cycles, and 
they will not be considered any further in this paper. One important observation 
is that any unranking algorithm depends, by definition, on the order that we have 
imposed on the combinatorial class, and that the order itself will depend on a 
few basic rules plus the given specification of the class. 

For instance, the order -<c„ among the objects of size n for a class C = A + B 
is naturally defined by 7 Ac„ 7^ if both 7 and 7' belong to the same class (either 
An or Bn) and 7 A 7' within their class, or if 7 G An and j' £ B„- It is then clear 
that although A + B and B + A are isomorphic (“the same class”), these two 
specifications induce quite different orders. The unranking algorithm for disjoint 
unions compares the given rank with the cardinality of An to decide if the sought 
object belongs to Al or to and then solves the problem by recursively calling 
the unranking on whatever class {A or B) is appropriate. 

For Cartesian products the order in C„ = (A x B)n depends on whether 
7 = (a,/3) and 7' = have first components of the same size. If |a| = 

\a'\ = j then we have 7 Ac„ 7^ if 01 ex' or a = a' and j 3 +Bn-j P'- But when 
|a| yf I a' I, we must provide a criterion to order 7 and 7'. The lexicographic order 
stems from the specification 

Cn = Aq X Bn + Al X Bn-l -f . . . -f An X Bq, 

in other words, the smaller object is that with smaller first component. On the 
other hand, the boustrophedonic order is induced by the specification 

Cn = Ao X Bn + An X Bg + Al X Bn-l + An-1 X Bi + A 2 X Bn-2 + ■ ■ ■ , 

in other words, we consider that the smaller pairs of total size n are those whose 
A-component has size 0, then those with A-component of size n, then those with 
A-component of size 1, and so on. Figure 1 shows the lists of unlabelled binary 
trees of size 4 in lexicographic (a) and boustrophedonic order (b). 

^ We shall sometimes use the term ’multisets’ to refer to these, to emphasize that 
repetition is allowed. 
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(a) Lexicographic order 








(b) Boustrophedonic order 

Fig. 1. Binary trees of size 4. 



Other orders are of course possible, but they either do not help improving 
the performance of unranking or they are too complex to be useful or of general 
applicability. The unranking algorithm for products is also simple: find the least 
j such the given rank i satisfies 

i 

^ ^ ^k^n—k ^ ^ ^ ^ ^ klkbn—k^ 
k=0 k=0 

with the provision that all ranks begin at 0 (thus the rank of a G An is the 
number of objects in An which are strictly smaller than a) . 

For labelled products we use the same orders (lexicographic, boustrophedo- 
nic), but we must also take the labels of the atoms into account. An object 7 in 
the labelled product A -kB is actually a 3-tuple {a,/3,p) where p is a partition 
of the labels {!,... ,n} into the set of labels attached to a’s atoms and the set 
of labels attached to /3’s atoms. We will assume that if a = a' and /? = /?' then 
7 = (a,/3,p) 1 ' = whenever p < p' according to the natural 

lexicographical criterion. The rest of combinatorial constructs work much in the 
same way as Cartesian products and we will not describe them here; the distinc- 
tion between lexicographic and boustrophedonic ordering makes sense for the 
other combinatorial constructs, and so we shall speak, for instance, of unrank- 
ing labelled sets in lexicographic order or of unranking unlabelled sequences in 
boustrophedonic order. 

The theoretical performance of the unranking algorithms is given by the 
following results (see [11], also [6]). 
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Labelled class 


Specification 


Cayley trees 


A = Z i< Set(A) 


Binary plane trees 


B = ZAB-kB 


Hierarchies 


C — Z + Set(C, card>2)) 


Surjections 


T> = Seq(Set(Z, card>l))) 


Functional graphs 


£ = Set(Cycle(A)) 


Unlabelled class 


Specification 


Binary sequences 


A = Seq(Z -f Z) 


Rooted unlabelled trees 


C = Z X Set(C) 


Non plane ternary trees 


T> = Z -\- Set(I>, card=3) 


Integer partitions with distinct parts 


£ = PowerSet(Seq(Z, card>l)) 



Fig. 2. Examples of labelled and unlabelled classes and their specifications 



Theorem 1. The worst-case time complexity of unranking for objects of size 
n in any admissible labelled class A using lexicographic ordering is of 0{n^) 
arithmetic operations. 



Theorem 2. The worst-case time complexity of unranking for objects of size n 
in any admissible labelled class A using boustrophedonic ordering is of 0{nlogn) 
arithmetic operations. 

A particular important case which deserves explicit treatment is that of it- 
erative combinatorial class. A class is iterative if it is specified without using 
recursion (in technical terms, if the dependency graph of the specification is 
acyclic). Examples of iterative classes include surjections (Seq(Set(Z, card > 1))) 
and permutations (Set(Cycle(Z))). 

Theorem 3. The cost of unranking any object of size n, using either lexico- 
graphic or boustrophedonic order, in any iterative class A is 0(n). 

Last but least, the average-case cost of unranking and random genera- 
tion under the lexicographic order can be obtained by means of a cost al- 
gebra that we describe very briefly here (for more details see [6,11]). Let 
a be the t-th object of An and cu(a) the cost of unranking this object a, 
then the cumulated cost of unranking all the objects in An is defined by 
cu(a). Now, if we introduce exponential and ordinary gener- 
ating functions for the cumulated costs in labelled and unlabelled classes re- 
spectively, TA{z) = J2n>0^-^nZ'^lnl = ^“4(z) = 

cu(a)zl“l, then the average cost of unranking all objects 
in An is given by pin, A = [z'^]T A{z) / [z"‘]A{z) , where A{z) denotes the generat- 
ing function of the class A (exponential GF if A is labelled, ordinary GF is A is 
unlabelled), and [z'^]f(z) denotes the n-th coefficient of the generating function 

f{z)- 

The computation of pin, A for labelled classes is then possible thanks to the 
application of the rules stated in the following theorem. 
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Theorem 4. Let A be a labelled combinatorial class such that e ^ A (equiva- 
lently, ^(0) = 0/ Then 

1. T(e) = T{Z) = 0. 

2. t\a + B) = TA + TB. 

3. T\A*B) = eA- B + TA- B + A-TB. 

4. T{Seq{A)) = {0A + TA)/{l-Af. 

5. T{Set{A)) = exp(A) • {OA + YA). 

6. rlCycle(A)) = (OA + TA)/(l - A). 

1. T(Seq{A, card = k)) = kA’^~^(OA + YA), 

8. Y(Set(A, card = k)) = A’^~^l(k - 1)! (OA + YA), 

9. Y(Subst(A,B) = (OA + YA)(B(z)) + (OB + YB) ■ A'(B(z)), 

where the operator O for generating functions is O = z-^, Y(Seq(A, card = 
0)) = Y(Seq(A, card = 0)) = 0 and k > 0. 

Similar rules exist for other variants of restricted cardinality, e.g. Set(^, card > 
k)) and for the unlabelled combinatorial constructs, although these are somewhat 
more complex. 

Thanks to Theorem 4 we can compute very precise estimates of iJ-nA] how- 
ever, they are based upon a few simplification which are worth recalling here. 
First of all, we do not count the preprocessing cost of parsing and converting 
specifications to the so-called standard form. We also disregard the cost of call- 
ing the count function, as we assume that this is performed as a preprocessing 
step and that the values are stored into tables. Finally, we also neglect the cost 
of converting the object produced by the unranking algorithm back to the non- 
standard form in which the original specification were given. Last but not least, 
we assume that the cost of the non-recursive part of the unranking algorithms 
is exactly the number of iterations made to determine the size of the first com- 
ponent in a product or sequence, the leading component in a cycle or set, etc. 
For example, if the size of a in the object (a, (3) of size n which we want to 
produce is j then we assume that the cost of unranking is j cu(a) -I- cu(/3). 
While this simplification does not invalidate the computation from the point of 
view of the order of growth of the average-complexity, a more accurate compu- 
tation is necessary to determine the constant factors and lower order terms in 
the average-case complexity of unranking. 

3 Experiments on the Performance of Unranking 

We have implemented all the unranking algorithms for the combinatorial con- 
structions described in the previous section, except for unlabelled cycles (this is 
a difficult open problem, see for instance [12]). Our programs^ have been writ- 
ten for Maple (Maple V Release 8) and run under Linux in a Pentium 4 at 1.7 
GHz with 512 Mb of RAM. The unranking algorithms use the basic facilities 

^ They are available on request from the second author; send email to 
molinero@lsi.upc . es. 
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for counting and parsing of specifications already provided by the combstruct 
package. We have also used the function draw in the combstruct package for 
random generation in order to compare its performance with that of our unrank 
function, but that is the subject of the next section. The interface to unrank is 
similar to that of the draw; for instance, we might write 

bintree:= B = Union(Z, Prod(B, B)) ; 

unrank([B, bintree, labelled], size = 10, rank = 3982254681); 
to obtain the following labelled binary tree of size 10: 




The function unrauik also accepts several optional parameters, in particu- 
lar we can specify which order we want to use: lexicographic (default) or 
boustrophedonic. 

The first piece of our experimental setup was the choice of the combinatorial 
classes A to be used. Our aim was to find a representative collection. It was 
specially interesting to find cases where both the unlabelled and labelled ver- 
sions made sense. As a counterexample, the labelled class Seq(Z) is interesting 
(permutations), but the unlabelled class Seq(Z) is not, since there is only one 
element of each size. The selected collection was^: 

1. Binary trees: B = Z + B x B. 

2. Unary-binary trees or Motzkin trees: A4 = Z + Z x A4 + Z x A4 x A4. 

3. Integer partitions (unlabelled): 

P = Set(Seq(Z, card > 1)). 

4. Integer compositions (unlabelled) / Surjections (labelled): 

C = Seq (Set (Z, card > 1)). 

5. Non-ordered rooted trees (unlabelled) / Cayley trees (labelled): 

T=Zx Set(T). 

® For labelled versions of the class we have to use labelled products * instead of 
standard Cartesian product x . 
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6. Functional graphs (labelled): 

T — Set(Cycle(T)). 

One goal of the first set of experiments that we have conducted was to em- 
pirically measure when using lexicographic ordering and to compare it 

with the theoretical asymptotic estimate, so that we could determine the range 
of practical validity of that asymptotic estimate. A second goal was to mea- 
sure which is similar to Jl„ ^ but it accurately counts all the arithmetical 

operations used by the unranking algorithms; recall that the theorical analy- 
sis briefly sketched in the previous section was based upon several simplifying 
assumptions which disregarded some of the necessary arithmetical operations. 
Nevertheless as iin,A^ does not take into account the arithmetic operations 
needed to parse specifications and to fill counting tables, since once this has 
been done as a preprocessing phase, all subsequent unrank or draw calls on the 
corresponding class do not need to recompute the tables or to parse the speci- 
fication. The inspection of the actual code of the unranking programs suggests 
that /rJi _4 ~ ^^n,A + 3n, an hypothesis which is consistent with the data that we 
have collected. 

We have also measured the average CPU time Tn,A to unrank objects of size 

n. 

We have also performed the same experiments using the unranking algo- 
rithms with boustrophedonic order. The theoretical analysis establishes that the 
worst-case for unranking any class using this order is C)(n log n); however, there 
are no specific results for its average-case complexity. The experiments support 
our conjecture that the average complexity is 0(nlogn) whenever the class is 
not iterative. 

Due to the huge size of the numbers involved in the arithmetical computations 
performed by unranking we have considered in our experiments objects of size 
up to 800. For instance, initializing the tables of counts to unrank objects of size 
300 can take up to 30 seconds of CPU! In each experiment we use N random 
ranks to gather the statistics; for sizes of the objects up to 300 we have used 
samples of iV = 10000 ranks, whereas for large objects we have used smaller 
samples, with N = 100 ranks. 

The theoretical cost of unranking binary trees (unlabelled or labelled) in 
lexicographic order is Hn,B = — \n + ^ + o{l) = 0.8862n-yn — 0.5n -I- 

0.0625 -I- o(l). The best fit for the measured data is g = 0.897nyn — 0.673n-|- 
2.086. The collected data shows that the theoretical asymptotic estimate is very 
accurate even for n = 50, the relative error being less than 1.1% (see Figure 3). 
Looking at the plot itself the difference is almost unnoticeable. 

All the samples used in this experiments had N = 10000 random ranks. 
The average CPU times (in seconds) also grow as n^/n in the considered range 
of sizes; for much larger sizes, it is not reasonable to assume that the cost of 
each single arithmetic operation is constant. The experimental data in the case 
of unlabelled binary trees does not substantially differ and the conclusions are 
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basically identical, except that average CPU times are noticeably smaller, as 
the magnitude of the involved counts is very small compared with that of the 
labelled case. 

On the other hand, the experimental data for g supports the hypothesis 
t^'nB — + 3n; a best fit gives Ji'ns = 1.010436960/Z„ g + 2.784136376n + 

7.752450064. 



n 




l^'n.B 


'^n,B 


50 


285.21 


432.21 


0.04 


100 


829.93 


1130.86 


0.13 


150 


1559.29 


1995.21 


0.26 


200 


2401.32 


3003.24 


0.43 


250 


3378.20 


4100.15 


0.66 


300 


4467.72 


5364.72 


0.91 



(a) Experimental data. 




(b) Plot of the experimental 
V‘n,B the theoretical 



Fig. 3. Unranking binary trees of size n in lexicographic order. 



Figure 4 gives the experimental data for the unranking of labelled binary 
trees using boustrophedonic order. The data is consistent with the hypothesis 
that on average this cost is 6>(nlogn) and in particular the best fit curve is 

0.623nlnn + 1.379n — 3.304. 

If we compute the best fit for the data corresponding to the boustrophedonic 
unranking of unlabelled binary trees we get 

0.653nlnn + 1.198n + 0.116, 

which suggests that the performance of the labelled and unlabelled versions of 
the boustrophedonic unranking for products is essentially the same. We will 
see later that this is no longer true when we compare the performance of the 
unranking (either lexicographic or boustrophedonic) of labelled and unlabelled 
sets. 

The collected data for boustrophedonic unranking involves objects of sizes 
from n = 50 to n = 300, with samples of = 10000 random ranks. Side by side, 
we give the corresponding values for lexicographic order to ease the comparison. 
The graph to the right also plots JI„ g for boustrophedonic order (solid line) and 
lexicographic order (dashed line); the nlogn behavior of the first versus the n^/n 
of the second is made quite evident in the plot. 
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n 


f^n,B 


yrllexj 

^n,& 


'^n,B 


^n,B 


50 


187.75 


285.21 


0.03 


0.04 


100 


421.21 


829.93 


0.07 


0.13 


150 


672.17 


1559.29 


0.11 


0.26 


200 


932.90 


2401.32 


0.15 


0.43 


250 


1202.21 


3378.20 


0.21 


0.66 


300 


1476.65 


4467.72 


0.26 


0.91 



(a) Experimental data. 




(b) Plot of the experimental data 
for boustrophedonic order vs. lexi- 
cographic order. 



Fig. 4. Unranking binary trees of size n in boustrophedonic order. 



Functional graphs (iF) are sets of cycles of Cayley trees. For instance, the 
graph for the function /, with /(I) = 8,/(2) = 6,/(3) = 2, /(4) = 5,/(5) = 
10, /(6) = 5, /(7) = 5, /(8) = 1, /(9) = 10, /(lO) = 6 is 




which is the 5000000-th element of size 10 in T . The theoretical analysis of the 
cost of unranking such objects gives 

\p2^ T 

^ n\fn -h -n -|- O(V^) = 0.6266n\/n + 2.3333n -I- 0(V^)- 

Now, the best fit for the measured data is = 0.609ni/n + 2.532n, in good 
accordance with the asymptotic estimate; the relative error is less than %1 in 
the range of sizes we have considered (see Figure 5). Again the plots of and 
JI„ j: show almost no difference. 

When comparing to Jl'„ we again find further evidence for the relation 
fj,'„ = ^j,„ + 3n; the comparison of the performance of the boustrophedonic order 
with that of the lexicographic order is qualitatively similar to what we found in 
the case of binary trees. 

Also, we find that the average cost of unranking in boustrophedonic seems 
to be 6>(nlogn); the best fit for the experimental data (see Table 1) is 

= 2.427nlnn — 6.235n -|- 77.612. 

Together with the other examples of non-iterative classes (binary trees, 
Motzkin trees, Cayley trees, ... ) the data for the boustrophedonic unrank- 
ing of functional graphs is consistent with our conjecture that the average cost 
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n 




T n,T 


50 


338.79 


0.05 


100 


861.11 


0.13 


150 


1499.93 


0.23 


200 


2239.00 


0.36 


250 


3038.26 


0.57 


300 


3928.05 


0.79 



(a) Experimental data. 




(b) Plot of vs. 



Fig. 5. Unranking functional graphs of size n in lexicographic order. 



of boustrophedonic unranking is 6*(nlogn) for non-iterative class (it is linear for 
iterative classes). 



Table 1. Unranking functional graphs of size n in boustrophedonic order. 



n 


l^n.T 


Tn,T 


50 


237.74 


0.04 


100 


577.06 


0.10 


150 


968.80 


0.17 


200 


1398.67 


0.26 


250 


1865.22 


0.36 


300 


2364.55 


0.48 



The results for the remaining considered classes are very similar. We sum- 
marize here our main findings. 

For the lexicographic unranking of Motzkin trees we have = ^^^n^/n+ 

0(n) = 0.5116ny/n + 0(n), while the best fit for the experimental data gives 
us Mn,M — 0.518ny/n — 0.454n + 6.74, with relative errors less than 1% be- 
tween the asymptotic estimate and the empirical values already for n = 50. For 
boustrophedonic unranking the best fit is = 0.642nlnn + 0.6n + 1.609. 

If we repeat the experiments for labelled Motzkin trees (which have the same 
theoretical performance as in the unlabelled case) we find that the best fit for 
lexicographic unranking is 0.513ny/n — 0.355n-|-4.54 and for boustrophdonic it is 
0.633nlnn -I- 0.655^/n + 0.872, again supporting the equivalence of labelled and 
unlabelled unranking when only unions, products and sequences are involved, 
even for boustrophedonic unranking. 

For the unlabelled class V of integer partitions, the best fit for lexicographic 
unranking is = 1.8162n + 19.382, while the best fit for boustrophedonic 

unranking is = 1.95n + 24.301. The theoretical analysis gives /Jn,p = 2n 

for lexicographic unranking. 
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If we consider the unlabelled class C = Seq (Set (Z, card > 1)) of integer com- 
positions then the best fit curves are 2. 7566n— 0.145-^71-1-0.2221 for lexicographic 
unranking and 3.0083n— 0.1952-yfn-|-0.5066 for boustrophedonic unranking. Here, 
the lexicographic unranking has theorical average cost Hn,c = |^+ Interest- 
ingly enough, for the labelled class C = Seq(Set(Z, card > 1)) of surjections 
the corresponding best fits are 2.002n and 2.9122n -|- 0.0584-y^ — 2.855. This 
confirms that while the labelled and unlabelled versions of the unranking algo- 
rithms for unions, products and sequences behave identically, it is not the case for 
labelled and unlabelled sets. Also, these experiments indicate that boustrophe- 
donic unranking could be slightly more inefficient than lexicographic unranking 
for iterative classes, contrary to what happens with the non-iterative classes, 
where boustrophedonic unranking clearly outperforms lexicographic unranking, 
even for small n. 

4 An Empirical Comparison of Unranking and Random 
Generation 

From a theoretical point of view this question has a clear cut answer: they have 
identical complexity. However, the hidden constant factor and lower order terms 
may markedly differ; the goal of the experiments described in this short section is 
to provide evidence about this particular aspect. We have chosen the implemen- 
tation of random generation already present in the combstruct package for a fair 
comparison, since the platform, programming language, etc. are identical then. 
From the point of view of users, the underlying ordering of the class is irrelevant 
when generating objects at random, hence the default order used by the function 
draw is the boustrophedonic order, which is never significantly worse than the 
lexicographic order and it is frequently much faster (see Section 3). Hence, to 
get a meningful comparison we will also use boustrophedonic unranking in all 
the experiments of this section. 

As in the previous section we will not take into account all the preprocess 
needed by both unranking and random generation, therefore we will “start the 
chrono” once the initialization of counting tables and the parsing of specifications 
have already been finished. Also, in the previous section we have given the data 
concerning the average CPU time for unranking; for the largest objects (n = 800) 
this time is around 1.2 seconds; for moderately sized objects (n = 300) the 
average time is typically around 0.3 secons. Since we want here to compare 
unranking and random generation we will systematically work with the ratio 
Pn,A between the average time of unrank and the average time of draw. 

Table 2 summarizes this comparative analysis. Here B denotes unlabelled bi- 
nary trees, T are the (labelled) functional graphs, A4 denotes unlabelled Motzkin 
trees and V the (unlabelled) class of integer partitions. The data varies widely 
from one case to other, so that no firm conclusions can be drawn. Some pre- 
liminary results (with small samples for large sizes) confirm the theoretical pre- 
diction that Pn,A must tend to a constant pj^ as n ^ oo, but there seems that 
no easy rule to describe/compute these constants. For instance, the first three 
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classes are non-iterative and the fourth is iterative, but there is no “commonal- 
ity” in the behavior of the three first classes. Neither there is such commonality 
between the arborescent classes B and Ai, or the unlabelled classes. From the 
small collection we have worked with, however, it seems that will be typically 
between 1.0 and 3.0. Further tests and analytical work is necessary to confirm 
this hypothesis, but if it were so, then sampling based on unranking plus Floyd’s 
algorithm should be the alternative of choice as long as the number of elements 
to be sampled were > 1000. 



Table 2. Ratios of average times for unranking and random generation. 



n 


Pn,B 


Pn,T 


pn,Jv[ 


Pn,V 


50 


1.00 


4.00 


1.50 


1.00 


100 


1.16 


3.33 


1.75 


1.00 


150 


1.22 


2.83 


2.20 


2.00 


200 


1.07 


2.88 


2.12 


2.00 


250 


1.10 


2.76 


2.10 


3.00 


300 


1.08 


2.66 


2.45 


2.66 
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Abstract. A vertex i of a graph G = (V,E) is said to be controlled 
by M C U if the majority of the elements of the neighborhood of 
i (including itself) belong to M. The set M is a monopoly in G if 
every vertex i G U is controlled by M. Given a set M C U and two 
graphs Gi = (V,Ei) and G 2 = (U, E 2 ) where Ei C E 2 , the monopoly 
VERIFICATION PROBLEM (mvp) consists of deciding whether there exists 
a sandwich graph G = (V,E) (i.e., a graph where Ei E E 2 ) 
such that M is a monopoly in G = {V,E). If the answer to the mvp 
is No, we then consider the max-controlled set problem (mcsp), 
whose objective is to find a sandwich graph G = (V) E) such that 
the number of vertices of G controlled by M is maximized. The mvp 
can be solved in polynomial time; the MCSP, however, is NP-hard. In 
this work, we present a deterministic polynomial time approximation 
algorithm for the MCSP with ratio | where n = |U| > 4. 

(The case n < 4 is solved exactly by considering the parameterized 
version of the MCSP.) The algoritm is obtained through the use of 
randomized rounding and derandomization techniques, namely the 
method of conditional expectations. Additionally, we show how to im- 
prove this ratio if good estimates of expectation are obtained in advance. 



1 Preliminaries 

Given two graphs Gi = (U, Ai) and G 2 = (P, A 2 ) such that Ei C E 2 , we say 
that G = (V,E), where E\ C A C £^ 2 , is a sandwich graph for some property 
77 if G = (V,E) satisfies 77. A sandwich problem consists of deciding whether 
there exists some sandwich graph satisfying 77. Many different properties may 
be considered in this context. In general, the property 77 is non-hereditary by 
(not induced) subgraphs (otherwise Gi would trivially be a solution, if any) and 
non-ancestral by supergraphs (otherwise G 2 would trivially be a solution, if any.) 
As discussed by Golumbic et al. [7], sandwich problems generalize recognition 
problems arising in various situations (when G\ = G 2 , the sandwich problem 
becomes simply a recognition problem.) 
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One of the most known sandwich problems is the chordal sandwich prob- 
lem, where we require G to be a chordal graph (a graph where every cycle of 
length at least four possesses a chord - an edge linking two non-consecutive ver- 
tices in the cycle). The chordal sandwich problem is closely related to the 
MINIMUM FILL-IN PROBLEM [19]: given a graph G, find the minimum number 
of edges to be added to G so that the resulting graph is chordal. The minimum 
FILL-IN POBLEM has applications to areas such as solution of sparse systems 
of linear equations [15]. Another important sandwich problem is the interval 
SANDWICH PROBLEM, where we require the sandwich graph G to be an interval 
graph (a graph whose vertices are in a one-to-one correspondence with intervals 
on the real line in such a way that there exists an edge between two vertices if and 
only if the corresponding intervals intersect.) Kaplan and Shamir [8] describe ap- 
plications to DNA physical mapping via the interval sandwich problem. In 
this work we consider a special kind of sandwich problem, the max-CONTROLLED 
SET PROBLEM (mcsp) [11], which is described in the sequel. 

Given an undirected graph G = (V,E) and a set of vertices M C V, a, 
vertex i € V is said to be controlled by M if |A^g[*] H M| > |iVG[z]|/2, where 
= {0 {j G V\{i,j) G E}. The set M defines a monopoly in G if every 
vertex z G K is controlled by M. Following the notation of [11], if cont{G,M) 
denotes the set of vertices controlled by M in G, M will be a monopoly in G if 
and only if cont{G, M) = V. 

In order to defined formally the MCSP, we first define the monopoly veri- 
fication PROBLEM (mvp) : given a set M C K and two graphs Gi = (V,Ei) 
and G 2 = (K, A 2 ), where Ei Q E 2 , the question is to decide whether there exists 
a set E such that E\ Q E Q E 2 and M is a monopoly in G = {V,E). If the 
answer of the mvp applied to M, Gi, and G 2 is No, we then consider the MCSP, 
whose goal is to find a set E such that Ei C E C E 2 and the number of vertices 
controlled by M in G = (V, E) is maximized. 

The MVP can be solved in polynomial time by formulating it as a network 
flow problem [11]. If the answer to the mvp is No, then a natural alternative is 
to solve the mcsp. Unfortunately, the MCSP is NP-hard, even for those instances 
where Gi is an empty graph and G 2 is a complete graph. In [11] a reduction from 
INDEPENDENT SET to the MCSP is given. In the same work, an approximation 
algorithm for the MCSP with ratio | is presented. 

The notion of monopoly has applications to local majority voting in dis- 
tributed environments and agreement in agent systems [1,6,10,13,16,17]. For in- 
stance, suppose that the agents must agree on one industrial standard between 
two proposed candidate standards. Suppose also that the candidate standard 
supported by the majority of the agents is to be selected. When every agent 
knows the opinion of his neighbors, a natural heuristic to obtain a reasonable 
agreement is: every agent z takes the majority opinions in A^[z]. This is known as 
the deterministic local majority polling system. In such a system, securing the 
support by the members of a monopoly M implies securing unanimous agree- 
ment. In this context, the motivation for the MCSP is to find an efficient way of 
controlling the maximum number of objects by modifying the system’s topology. 
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In this work, we present a linear integer programming formulation and a 
randomized rounding procedure for the mcsp. As far as we know, our procedure 
achieves the best polynomial time approximation ratio for the MCSP. If y* de- 
notes the optimum value of the linear relaxation and y* > A(k) (for some fixed 
k G (1,2] and some function A{k) > 4), the approximation ratio 
improves the ^-approximation algorithm presented in [11]. As described later, 
the case y* < A(k) may be solved exactly by considering a polynomial time 
algorithm for the parameterized version of the MCSP. This procedure is based 
on the ideas presented in [11] for the mvp. 

This paper is organized as follows. In Section 2 some basic notation and 
results from [11] are presented. These are fundamental for the development of 
our algorithm. In Section 3, we introduce the parameterized MCSP. For a given 
parameter A > 0, we solve exactly the parameterized mcsp in time O(n^). 
Section 4 gives a detailed description of our MCSP formulation and outlines our 
randomized rounding procedure. In randomized rounding techniques, we first 
solve the linear relaxation and “round” the resulting solution to produce feasible 
solutions. In Section 5 we present an approximation analysis via the probabilistic 
method (introduced by Erdos and Spencer [5]). In this case, the main objective 
is to construct probabilistic existence proofs of some particular combinatorial 
structure for actually exhibiting this structure. This is performed through the 
use of derandomization techniques. In Section 6 we describe a derandomized 
procedure via the method of conditional expectations, achieving an improved 
deterministic approximation algorithm for the MCSP with performance ratio ^ -I- 

Finally, in Section 7, we present some conclusions and suggestions for 
future work. 

2 The |-Approximation Algorithm for the MCSP by 
Makino et al. 

Consider a maximization optimization problem P and an arbitrary input in- 
stance I of P. Denote by z{I) the optimal objective function value for I, and by 
zh{I) the value of the objective function delivered by an algorithm H. Without 
loss of generality, it is assumed that each feasible solution for I has a non-negative 
objective function value. Recall that is a (^-approximation algorithm for P if 
and only if a feasible solution of value Zh{I) > ‘pz{l) is delivered for all instances 
I and some (p satisfying 0 < (^ < 1. 

From now on, we suppose that the answer for the mvp when applied to 
M, G\ and G2 is No. Let us briefly describe the deterministic ^-approximation 
algorithm for the MCSP presented in [11]. For A,B C V, define the edge set 
D{A,B) = {(i,j) G E 2 \Ei I i G A,j G B}. Let U = V\M. Two reduction 
rules are used: a new edge set El is obtained by the union of Ei and D{M, M), 
and a new edge set E^ is obtained by removing D{U,U) from A 2 . Since we 
are maximizing the total number of vertices controlled by M, these reduction 
rules do not modify the optimal solution. In other words, the edge set E in the 
sandwich graph G satisfies E\ U D{M, M) C if C U D{M, M) U D{U, M). 
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For simplicity, assume from now on Ei = (Reduction Rule 1) and E 2 = 
(Reduction Rule 2). In the MYK algorithm, W1,W2 C V denote the sets of 
vertices controlled by M in G = (V, E) for E = Ei and E = E 2 , respectively. 

Algorithm 1: MYK algorithm [11] 

1. compute jlFij by removing from G 2 the edge set D{U,M) 

2. compute IIF 2 I by adding to Gi the edge set D{U,M) 

3. ZH ^ max{|lFi|, IIF 2 I} 

Formally, they proved the following result: 

Theorem 1. The value zh returned by Algorithm 1 satisfies zh{I) > ^z(I), 
for all instances I of the mcsp. 

3 Parameterizing the MCSP 

In this Section we introduce the parameterized mcsp. Let A be a fixed non- 
negative integer. The objective is to find, in polynomial time, a solution for the 
MCSP with value at least A. In other words, we require the parameter A to be a 
lower bound for the maximum number of vertices that can be controlled by M 
in a sandwich graph G = (V, E) with E\ C E C E 2 - 

Let us describe an O(n^) algorithm for the parameterized mcsp. We first 
consider a partition of V into six special subsets (some of them are implicitly 
described in [11]): 

— Me and Uc, consisting of the vertices in M and U, respectively, which 
are controlled by M in any sandwich graph (vertices which are “always 
controlled”); 

— Mm and Um, consisting of the vertices in M and U, respectively, which 
are not controlled by M in any sandwich graph (vertices which are “never 
controlled”); 

~ Mr and Ur, defined as Mr = M\{Mc U Mm) and Ur = U\{Uc LIUm)- 

Define the binary variables Xij G {0, 1} for i,j € V and assume that Xij = Xji, 
'ii,j G V. Define binary constants Oij G {0, 1} such that Oij = 1 if and only if 
i = j or (i,j) G E 2 - Consider now the following auxiliary equations: 

Ei — ^ ^ OijXij ^ ^ aijXij, for % — 1, . . . , \ V\ (1) 

j&M jeu 

In these equations, assume that: Xr = 1 for every i € V, Xij = 1 for every 
(z, j) G El, and Xij = 0 for every (z,j) ^ E 2 - This means that the remaining 
binary variables are associated to edges in E 2 \Ei (set of optional edges). It is 
clear that Bi > 0 for some 0-1 assignment (to variables associated to optional 
edges) if and only if vertex z can be controlled by M in some sandwich graph. 
Observe that the subsets Mq,Uc,Mm,Um can be thus characterized by the 
following properties: 
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— i G Me U C/c if and only if > 0 for every 0-1 assignment; 

— i G Mn U Un and only if < 0 for every 0-1 assignment. 

In fact, it is easy to construct these four sets, since it is sufficient to look 
at “worst-case” assignments. For instance, \i i G M then i G Me if and only if 
Bi>Q for a 0-1 assignment which sets Xij = 1 for every (i,j) G U). 

Use now the following new reduction rules: 

— set Xij = 1 for every (1, j) G D{Me U M^, Ur) (Reduction Rule 3); 

— set Xij = 0 for every (i,j) G D{Mr, Ue U Un) (Reduction Rule 4); 

~ set arbitrary values to variables Xij for (i,j) G D{Me U M^jUe U Uat) 

(Reduction Rule 5). 

It is clear that if A < | Me | -|- 1 Uc | then the algorithm for the parameterized 
MCSP answers Yes. Hence, from now on, assume that A > \Me \ + \Ue\- 

Clearly, Mr = 0 if and only if Ur = 0. Thus, if Mr = 0 or Ur = 0, the 
algorithm must answer No. Add then the assumption Mr, Ur yf 0. 

We fix an arbitrary subset S C MrUUr of cardinality [S'] = A— \Me\ — \Ue\, 
and check whether it is possible to control S. Similarly to the reduction rules 
described above, set Xij = 1 for every edge (i,j) G D{Mr\S,Ur fl S), and 
Xij = 0 for every edge (i,j) G D{Mr fl S, Ur\S). Finally, set Xij = 0 for every 
edge (z, j) G D{MrC\S, UrC\S), and calculate the corresponding BiS according 
to equations (1). 

Following the ideas in [11], we construct a network J\f whose vertex set con- 
sists of S together with two additional vertices s,t. Create an edge (s,z) with 
capacity Bi for every z G SC\Mr, an edge (j, t) with capacity R' = max{— Rj, 0} 
for every j G S' fl Ur, and an edge (z, j) with capacity 1 for every (z,j) G 
D{S n Mr, S n Ur). Notice that Af can be constructed in constant time, since 
its edge set contains at most 0{A^) elements. Now, if the maximum flow in Af 
is equal to ^ controlled by selecting the edges of the 

form (z,j) G R(S fl Mr,S fl Ur) with unitary flow value. This maximum flow 
problem can be solved in constant time, depending on A. 

By repeating this procedure for every S C Mr U Ur such that jSj = A — 
\Me\ — \Ue\, we obtain an algorithm with complexity 0(n"^). 

4 An Improved Randomized Rounding Procedure for the 

MCSP 

The definition of performance ratio in randomized approximation algorithms is 
the same as in the deterministic ones. In this case, however, zr{I) is replaced by 
E{zh{I)), where the expectation is taken over the random choices made by the 
algorithm. Then, an algorithm H for a maximization problem is a randomized 
(^-approximation algorithm if and only if E{zh{I)) > 'Az{I) is delivered for all 
instances I and some 0 < (^ < 1. 

In randomized rounding techniques (introduced by Raghavan and Thomp- 
son [14]), one usually solves a relaxation of a combinatorial optimization problem 
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(by using linear or semidefinite programming), and uses randomization to return 
from the relaxation to the original optimization problem. The main idea is to 
use fractional solutions to define tuned probabilities in the randomized round- 
ing procedure. Additional executions of this randomized procedure arbitrarily 
reduce the failure probability (Monte Carlo method) . 

In order to introduce a new integer programming formulation for the mcsp, 
we define the binary variables Zi for i £V , which determine whether vertex i is 
controlled or not by M . Binary variables Xij are used to decide whether optional 
edges belonging to E 2 \Ei will be included or not in the sandwich graph. The 
objective function (2) computes the maximum number of controlled vertices. As 
defined before, binary constants G {0,1} are associated to edges (i,j) £ E 2 
with Uij = 1 if and only Hi = j or {i,j) £ A 2 . (Assume that aij = £ V.) 

Inequalities (3) guarantee that every time a vertex i is controlled by M, the left 
hand side will be greater than or equal to 1. On the other hand, if the left hand 
side is less than 1, vertex i will not be controlled by M and Zi will be set to 0. The 
divisions by n are used to maintain the difference between the two summations 
always greater than —1. Equalities (4) define the set of fixed edges. The linear 
programming relaxation is obtained by replacing integrality constraints (5) and 
(6) by Xij G [0, 1] and Zj G [0, 1], respectively. 



z = maxX;,gy2» (2) 

subject to: 

T 1 ^ 2j,Vi G V (3) 

x^j = £ El (4) 

x,,£{0,l}y{i,j)£E2\Ei (5) 

ZiG{0,l},VfGC (6) 



It is assumed from now on that (x*,z*) and y* will denote, respectively, 
an optimal solution of the relaxed integer programming formulation and its 
associated objective function value. The value of the original integer problem 
will be denoted by z. 

The value of linear programming relaxation may be improved if the reduction 
rules for the MCSP are used. As will be observed in Section 5, the performance 
ratio of our randomized algorithm is based on the value of the linear relaxation 
and an improvement of this ratio is attained if good upper bounds are obtained. 
Thus, assume without loss of generality that MCSP instances satisfy Reduction 
Rules 1 to 5. 

Algorithm 2, based on randomized rounding techniques, is a Monte Carlo 
procedure and delivers, in polynomial time and with high probability, a value 
within a prescribed approximation ratio. In Step 3 of the algorithm we define 
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a function A{k) for a given parameter k G (1,2] conveniently chosen. The con- 
struction of A{k) will be detailed in the next section. For the time being, an 
“oracle” is used. 

Algorithm 2: randomized algorithm for the mcsp 

1. compute zi using Algorithm 1 

2. solve the linear programming relaxation and return x* and y* 

3. compute A{k) for some k G (1,2] (conveniently chosen) 

4. if y* < A(k) 

then compute Zjj by executing the algorithm for the parameterized 
MCSP in Section 3 for parameters A = [y*J , [y*\ — 1, . . . ,1,0, until 
obtaining a Yes answer 
else for each (i,j) G E 2 \Ei do 

Pr{xij = 1) = x*j (constructing the integer feasible soluction) 

Pr{xij = 0 ) = 1 - x*j 

compute Z 2 by using the integer feasible function x 
zh <— max{ 21 , 2 : 2 } 

5. return zh 

We can use, for example, an interior point method in Step 2 (introduced 
by Karmarkar [9]) to compute the fractional solution x*, yielding in this way a 
polynomial time execution for Algorithm 2. Observe that Algorithm 2 always 
produce a feasible solution, and additional executions of Step 4 (for y* > A{k)) 
arbitrarily reduce the failure probability, provided that a prescribed approxima- 
tion ratio is given. Moreover, it is obviously a 1-approximation algorithm since 
Algorithm 1 was used in Step 1. As will be pointed out in the next section, this 
will directly help us to build an improved approximation algorithm with ratio 
j for some k G (1,2], conveniently chosen. It is straightforward to ob- 

serve that, even for k = 2, this ratio is strictly greater than 1, thus improving the 
previous result of [11]. In addition, recall that all those instances with y* < A{k) 
(for some parameter A = [y*J ) are polinomially solved by the algorithm for the 
PARAMETERIZED MCSP given in Section 3. 

5 Approximation Analysis 

Before to proceed to the approximation analysis, consider the following auxiliary 
definitions and lemmas. We first present the notion of negative association. 

Definition 1. (Negative Association) Let X = (Xi, X 2 , ■ . ■ ,X„) be a vec- 
tor of random variables. The random variables X are negatively associated if for 
every two disjoint index sets J, J C {1, 2, . . . , n}, E{f{Xi, i G I)g{Xj,j G J)) < 
E{f{Xi,i G I))E{g{Xj,j G J)) for all functions f : -G 5ft and g : 5ftl"^l -G 5ft 

that are both non- decreasing and both non-increasing. 




348 



C.A. Martinhon and F. Protti 



For a more detailed study concerning negative dependence see Dubhashi and 
Ranjan [4]. 

The next lemma ensures that the lower Chernoff-Hoeffding bound (lower CH 
bound) may be applied to not necessarily independent random variables. See 
Motwani and Raghavan [12] and Dubhashi and Ranjan [4] for the proof. An 
analogous result may be established for the upper CH bound. 

Lemma 1. (Lower Chernoff-Hoeffding Bound and Negative Associa- 
tion) Let Xi,X 2 , . . . ,Xn be negatively associated Poisson trials such that, for 
I < i < n, Pr{Xi = 1) = Pi, where 0 < pi < 1. Then, for X = 
fj, = E{X) = Yll=iPi’ 0 < 5 < 1, we have that Pr{X < (1 — S)pL) < 

exp{—p5‘^/2). 

Finally, consider the following auxiliary lemma: 

Lemma 2. Let X,Y be arbitrary random variables. Then E{min{X,Y)) < 
min{E{X),E{Y)). 

Now, in order to describe the approximation analysis of Algorithm 2, we 
define random variables Zt G {0, 1} for every i G V. These variables denote the 
set of vertices controlled by M. We also define random variables Xij G {0, 1} for 
every i,j G V. Assume Xu = 1 for every i G V, and Xij = 1 for every (i,j) G Ei. 
Observe that variables Xij for {i,j) G E 2 \Ei are associated to the set of optional 
edges. Additionally, let Zh be the sum of not necessarily independent random 
variables Zi G {0, 1} for i gV . Thus, we have the following preliminary result: 

Lemma 3. The random variables Zi for all i G V are negatively associated. 

Proof: Consider two arbitrary disjoint index sets I,JC {1, 2, . . . , n}. Then we 
want to show that: 

E - ^(E ^*)^(E = E (E{Z,Z,) - E{Zi)E{Z,)) < 0 

iG/ iGJ iG/ jGJ iel,je.j 

In particular, it is easy to observe (from the definition of the mcsp) that 
Zi and Zj (for i ^ j) are independent random variables if they simultaneously 
belong to M (or U). However, Zi and Zj are negatively associated if they are 
not in the same set. Generally, for arbitrary index sets L and J, we can establish 
that Pr{Zj = 1 \ Zi = 1) < Pr(Zj = 1) or, equivalently, Pr{Zi = 1 \ Zj = 
1) < Pr{Zi = 1). Thus, for every pair i G I and j G J, we have that E{ZiZj) = 

Pr{Z,Zj = 1) = Pr{Z, = l)Pr(Z, = 1 | Z, = 1) < Pr{Z, = l)Pr(Z, = 1) = 

E{Zi)E{Zj), which proves the lemma, q 

Now, consider our relaxed integer programming formulation. For i,j G V, 
assume Xij = 1 if i = j or (i,j) G E\. Assume also we assign, as described in 
Algorithm 2, arbitrarily values Xij for every (i, j) G E 2 \E\. If Zh is the sum of 
random variables denoting the value of the randomized solution, it follows from 
constraints (3)-(6) that: 

Zh = ^ ^ Zi < 'y ] minjl; ^ ] (aij f rf)Xij — y ' {aijf2n)Xij + 1} (7) 

iGV i&V j&M j&V 
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From Lemma 2 and the linearity of expectation one obtains: 

E{Zh) = E{Zi) < min{l; ^ {a,jln)x*^ - ^(a„/2n)a;L + 1}, 
iGV j&M j&V 

where E{Xij) = x*y Therefore: 

E{Zh) < Y. min{l; <} ^ E{Zh) <Y< = y* ^ (^) 

i^V i£V 

Recall that Step 1 Algorithm 2 guarantees a performance ratio equal to 
Therefore, each iteration of Algorithm 2 returns a solution with Z^ > zj2 
(where z denotes the value of the optimal integer solution) . Now, as the optimal 
solution itself may be generated at random, one may concludes, without loss 
of generality, that E{Zh) is strictly greater than z/2 (otherwise, the solution 
generated by Algorithm 1 would be optimal). Thus, we assume from (8) that 
z/2 < E{Zh) < y*, where E{Zh) = y = y*/f^ for some /3 G [1,2). 

Now, for some a > 1, to be considered later, define a bad event B = {Zh < 
y*/a). Equivalently to the definition of a randomized approximation algorithm 
(described in the preceding Section), Zh defines an ^-approximation solution 
for the MCSP if 0 holds (complementary event). 

How small a value for a can we achieve while guaranteeing good events 
5° yf 0? Since we expect to obtain an approximation algorithm with a superior 
performance ratio (greater than |), it suffices to consider a G ((3,k) for some 
k G (/3, 2]. The parameter k will be fixed later. This give us an improved ~ 
approximation Zh with nonzero probability. As discussed later, this solution 
will be made deterministic through derandomization techniques, namely, the 
method of conditional expectations. 

Therefore, a bad event B occurs if Zh < y* jet. Then: 



Pr{B) = Pr{ZH <—) = Pr{ZH < ^) = Pr(Z^ < (1 - <5)^^), 
a a 

where <5 = 1 — - > 0. 

CX 

In order to apply the lower CH bound, in addition to the negative association 
(Lemma 3), all random variables must assume values in the interval (0,1). In our 
case, however, as observed in Section 3, Pr{Zi = 1) = 1 for every i G Mq U Uc 
(set of vertices which are always controlled by M) and Pr{Zi = 1) = 0 for every 
i G Mh U Un (set of vertices which are never controlled by M). Despite of that, 
CH bounds may be applied, since the linear programming relaxation is being 
solved by some interior point method (see Wright [18].) 

Therefore, from the lower CH bound and assuming y > z/2, it follows that: 



Pr{B) < 



exp {{1- P/ay 



exp ((1 - P/ay I) 



This implies: 



1 

exp((l - a/py f ) - exp(i) 



Pr{B) < 



(10) 
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We expect that Pr{B) < 1 (probability of bad event). Thus, if we impose 
this last condition, it follows from (10) that: 



Pr{B) 



1 

exp((l - a//3)2 I) - exp(i) 



< 1 for some a G (/?, k) 



( 11 ) 



Additional executions of Step 4 in Algorithm 2 for y > A(k) arbitrarily 
reduce the failure probability (Monte Carlo method). Therefore, without loss of 
generality, if Pr{B) = C < 1 is the probability of a bad event, and i5 > 0 is a given 
error, [ | log 5 / log C \ ] iterations are sufficient to ensure a ^-approximation 
algorithm with probability 1 — <5 > 0. 

Then, we need to determine if there is some value a G (/3, k) (where /? > 1 
and k < 2) for which inequality (11) makes sense. Equivalently, we expect to 
obtain {z — l)a'^ — {2zP)a + f3‘^z > 0 for some a G {f3, k). By solving the quadratic 
equation, we obtain the roots: 



a 



/ 



(3{z - 
z-1 



and a" 



fi{z + 41) 
z-1 



Since a > /3, it is easy to observe that inequality (z ± 4i)/{z — 1) > 1 holds 
only for a" with z > 1. In addition, we expect that a" < k for some k G (/3, 2]. 
Thus, since (i = y* / 1, it follows that: 



a 



ff 



y*{z+4l) 

y{z - 1 ) 



< k^ 



y*{z + 41) 
k{z — 1) 



< i^<y*- 



(12) 



Therefore: 



^fc(g < ?/* ^ ^ + ^ < - 1) (13) 

Now, inequality (13) holds only for z > A{k) with: 

An \ _ 2fc(fc — 1) -I- 1 -I- 4 4fc(fc — 1) -I- 1 
" 2(fc-l)2 

Notice that constraint z > 1 above is immediately verified since we have 
A{k) > 4 for every k G (1,2]. Finally, from expression (12), since E{Zh) = /r, 
z <y* and y* > A{k), it follows that: 



E{Zh) > 





> 



( y* + 4 r \, 

\k{y*-l)) 




1 + Vr V . 

k{y*-l)) 



(14) 



Moreover, observe from the above expression that: 



1 + Vr 

k{y* - 1) 



> 0 



and 



lim 

!/•->■ OO 



1 + Vr 

k{y* - 1) 



0 for every y* > A{k). 
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Thus, inequality (14) gives us a randomized ^ -approximation algo- 

rithm for every y* > A{k) and k G (/3,2]. Therefore, with high probability and 
for a large class of instances, this ratio improves the ^-approximation algorithm 
in [11]. The case y* < A{k) may be solved exactly in time through the 

algorithm for the parameterized mcsp in Section 3. Observe for instance that 
A{k\) > A{k 2 ) for every k\,k 2 G (/?, 2] with ki < k 2 - In other words, despite 
the increase in the computational time of the algorithm for the parameterized 
MCSP, small values of k (for k > j3) guarantee improved approximation ratios 
for every y* > A(k). Formally, we proved the following result: 

Theorem 2. Consider y* and y as above. Then, for a given parameter k G {(3, 2] 
with P = y*/y, Algorithm 2 defines a randomized ^ -approximation 

algorithm for the MCSP. 

Unfortunately, we do not know explicitly the value of /3 = y* / y since the 
expectation y is unknown and hard to compute. Moreover, we cannot guarantee 
a parameter k strictly less than 2. This problem is minimized if some good 
estimations of E{Zu) = y, and thus of are obtained. By running independent 
experiments with respect to Zh, the recent work of Dagum et al. [2] ensures, 
for given S and e, an estimator y' of y within a factor 1 -|- e and probability at 
least 1 — (5. Therefore, if this approximation is performed in advance, and if we 
assume k = min{2, an improved randomized approximation algorithm 

(for every instance of the MCSP) may be achieved if fc < 2. Notice for instance 
that, given an interval (P,k), the proof of Theorem 2 guarantees the existence 
of a G (/3, k), thus improving the performance ratio. 

6 A Derandomized Algorithm 

Derandomization techniques convert a randomized algorithm into a deterministic 
one. Here, this is performed through the probabilistic method (introduced by 
Erdos and Spencer [5]). The main idea is to use the existence proof of some 
combinatorial structure for actually exhibiting this structure. 

The purpose of this section is to derandomize Algorithm 2 by using the 
method of conditional expectations. In this case, the goal is to convert the ex- 
pected approximation ratio into a guaranteed approximation ratio while increas- 
ing the running time by a factor that is polynomial on the input size. Basically, 
the method of conditional expectations analyzes the behavior of a randomized 
approximation algorithm as a computation tree, in a such way that each path 
from the root to a leaf of this tree corresponds to a possible computation gener- 
ated by the algorithm. 

In order to describe our derandomized procedure for the MCSP, consider 
inequality (7). Then, it follows that: 

Zh = '^Zi <’^ Yi, where Yi = min{l, ^ {aijln)Xij - ^{aij/2n)Xij + 1} 

ieV iGV jeM jeV 
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Recall that Xu = 1 for every i G V, and Xij = l,V(t,j) G _Ei. In addition, 
suppose that all optional edges in E 2 \Ei are arbitrarily ordered and indexed 
by fc = 1, . . . , \E 2 \E 1 \. In this section, the notation = I has the following 
meaning: the /c-th edge of E 2 \E\, with endpoints i and j, belongs to the sandwich 
graph G = (V,E). Otherwise, = 0 means that (i,j) ^ E. For simplicity, we 
will suppress indexes i and j and simply write Capital letters X^^'> mean 
that a value 0 or 1 was assigned to variable for some fc G {1, . . . , \E 2 \E 1 \}. 
Furthermore, the notation E{Zh \ x^^'^ = 0 or 1) denotes the average value 
produced by the randomized algorithm by computations that set x^*^ = 0 or 1. 

Thus, from de definition of conditional expectation and from its linearity 
property one concludes that: 

E{Zh) = E{Zh I = l)Pr(xd) = 1) + E{Zh \ x^^'> = 0)Pr(x<^) = 0) 

< max{ Eigv \ = 1 ) ; P(Z, | x^ = 0 ) } = Eiev I 

< max{Eigv I Ad), *(2) = 1) ; Y.i^yE{Zi \ Xd),a;d) =0)} 

= Ad),xd)) 

By repeating this process for every edge in E 2 \E\, one obtains: 

E{Zh) < max{^P(Zi | Xd),... ^ | ^ 

ieV ieV 

Therefore, within this framework, a guaranteed performance ratio is poly- 
nomially attained through an expected approximation ratio, gathering, in this 
way, an improved deterministic approximation solution. 

Now, from the definition of conditional expectation, 

P(Zi I Ad),... ,xd-F^a;(*=) =0 or 1) = Pr(Zi = 1 | xd), . . . , = 0 or 1), 



for every i G V and k = 1,... ,|i?2\^'i|- Unfortunately, for the MCSP, these 
probabilities are hard to compute. Lemma 4 will give us an alternate way to deal 
with these expectations without explicitly consider conditional probabilities. 

Lemma 4. Suppose that Zi and Yi, for some i G V , are random variables as 
described above. Then: 

(a) E{Zi I Ad),... ,Ad-i),a;(fc) = 1) > E{Z, \ Ad),... ,Ad-i),a;(fc) = Q) 

e\y, I A(d,... ,Ad-d,a;d) = 1) > E{Y, \ Ad),... ,Ad-d,a;d) = 0) 

(b) E{Zi I Ad),... ,Ad-i),a;(fc) = 0) > E{Z, \ Ad),... ,Ad-d,a;d) = 1) 

e\y, I Ad),... ,Ad-d,a;d) = 0) > E{Yi \ Ad),... ,Ad-d,a;d) = 1) 

Proof: We will prove item (a), the proof of (b) follows analogously. Consider 
without loss of generality that E{Zi \ Ad),... ,Ad~i),xd) = 1) > E{Zi \ 
Ad),... ,Ad“^),xd) = 0). Thus, since E{Zu) = y* / P for some P G [1,2), it 
follows from inequalities (7)-(8) and from the definition of conditional expecta- 
tions that: 

P(Zi I Ad),... ,xd-d,a;('') =0 or 1) = (1//3) P(U | xd), . . . , X^-d, = 0 or 1). 
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Thus: 

{l/(3)E{Y, I = i) > [l/(3)E{Y, \ =0). 

By multiplying both sides by (3, we get the desired inequality 

E{Y, I = 1) > E{Y, I = 0). 

The converse is obtained in the same way by first multiplying this last in- 
equality by -p- □ 

Now, from Lemma 4, it follows that E{Zh) is less than or equal to 

max{^S(Ti I = 1); ^E{Yi \ X^^\ . . . = 0)}, 

i^V i^V 

for fc = 1, . . . , \E 2 \E 1 \. 

We repeat the process above for every optional edge in E 2 \E\. Therefore, the 
sequence is obtained deterministically in polynomial time 

while improving the approximation ratio. 

From the preceding section, we have described a randomized algorithm whose 
expectation E{Zh) is greater than or equal to -I- for some k G (/3, 2] 

conveniently chosen. Since we expect to obtain a deterministic procedure, it 
suffices to consider (in the worst case) k = 2 and y* = 6{n). Observe, from 
the preceding section, that by setting k = 2 one obtains A(2) = 4. This will 
give us (for an arbitrary instance) an improved deterministic polynomial time 
approximation algorithm with performance ratio equal to ^ -I- ■ 

Algorithm 3: derandomized algorithm for the mcsp 

1. compute zi using Algorithm 1 

2. solve the linear programming relaxation and return y* 

3. if y* < 4 

then compute Zh by executing the algorithm for the PARAMETERIZED 
MCSP in Section 3 for parameters A = [y *\ , [j/*J — 1, . . . , 1, 0, until 
obtaining a Yes answer 
else for k = 1, . . . , \E 2 \E 1 \ do 

if E(Yk I = 1) > E{Yk \ A™, . . . , = 0) 

then A(*) ^ 1 
else -<—0 

compute Z 2 by using the integer feasible function X 
Zh t- max{2;i,2;2} 



5. return Zh 

Observe above that expectations E{Yk \ A^^\ . . . = 0 or 1) 

are easily obtained. This may be accomplished in polynomial time by solv- 
ing a linear programming problem for every optional edge (settled 0 or 1). 
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If L denotes the length of the input, the linear relaxation has complexity 
0{n^L) [18], and thus the total complexity of Algorithm 3 will be equal to 
0(max{n^, |i? 2 \i?i jn^A}). Moreover, from Theorem 2, it is straightforward to 
observe that an improvement of the approximation ratio may be attained if 
good upper bounds are obtained via the linear relaxation. This may be accom- 
plished, for example, through the use of new reduction rules and/or through the 
use of additional cutting planes. Notice for instance that, even in the worst case, 
when y* = 0{n), one obtains an improved approximation ratio. Formally, we can 
establish the following result: 

Theorem 3. Algorithm 3 guarantees in polynomial time an approximation ratio 
equal to \ + for n > 4. 

7 Conclusions 

We presented an improved deterministic polynomial time approximation al- 
gorithm for the Max-Controlled Set Problem through the use of randomized 
rounding and derandomization techniques. As far as we know, this is the best 
approximation result for the mcsp. This improves the ^-approximation proce- 
dure presented by Makino, Yamashita and Kameda [11]. A new linear integer 
programming formulation was presented to define tuned probabilities in our ran- 
domized procedure. Through the use of the probabilistic method, we converted 
a probabilistic proof of existence of an approximated solution into an efficient 
deterministic algorithm for actually constructing this solution. Additionally, we 
show that if some good estimations of expectation are obtained in advance, some 
improved approximation ratios may be attained. 

As future work, an interesting question is to decide whether the parameter- 
ized MCSP is Fixed Parameter Tractable - FPT. (A problem with parameter A 
is FPT if it admits an 0{f{A)n~^) time algorithm, for some function / and some 
constant 7 independent of A. For details, see [3].) Obtaining non-approximability 
results for the MCSP and using semidefinite programming relaxation in the ran- 
domized rounding procedure are also interesting attempts of research. 

Acknowledgments. We thank Marcos Kiwi and Prabhakar Raghavan for their 
valuable comments and pointers to the literature. 
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Abstract. This paper describes a GRASP with path-relinking heuris- 
tic for the quadratic assignment problem. GRASP is a multi-start pro- 
cedure, where different points in the search space are probed with local 
search for high-quality solutions. Each iteration of GRASP consists of the 
construction of a randomized greedy solution, followed by local search, 
starting from the constructed solution. Path-relinking is an approach to 
integrate intensification and diversification in search. It consists in ex- 
ploring trajectories that connect high-quality solutions. The trajectory is 
generated by introducing in the initial solution, attributes of the guiding 
solution. Experimental results illustrate the effectiveness of GRASP with 
path-relinking over pure GRASP on the quadratic assignment problem. 



1 Introduction 

The quadratic assignment problem (QAP) was first proposed by Koopmans and 
Beckman [10] in the context of the plant location problem. Given n facilities, 
represented by the set F = {/i,...,/„}, and n locations represented by the 
set L = {^ 1 , . . . one must determine to which location each facility must 
be assigned. Let = (oij) be a matrix where Uij G represents the flow 

between facilities fi and fj. Let = (bij) be a matrix where entry bij G 

represents the distance between locations k and Ij. Let p : {1 . . . n} — {1 . . . n} 
be an assignment and define the cost of this assignment to be 

n n 

c(p) = 

i=i j=i 

In the QAP, we want to And a permutation vector p G Iln that minimizes 
the assignment cost, i.e. minc(p), subject to p G 7T„, where i7„ is the set of all 
permutations of {1, ... , n}. The QAP is well known to be strongly NP-hard [18]. 

GRASP, or greedy randomized adaptive search procedures [5,6,8,17], have 
been previously applied to the QAP [12,14,15]. For a survey on heuristics and 
metaheuristics applied to the QAP, see Vo6[19]. In this paper, we present a new 
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GRASP for the QAP, which makes use of path-relinking as an intensification 
mechanism. In Section 2, we briefly review GRASP and path-relinking, and give 
a description of how both are combined to And approximate solutions to the 
QAP. Experimental results with benchmark instances are presented in Section 3. 
Finally, in Section 4 we draw some concluding remarks. 

2 GRASP and Path-Relinking 

GRASP is a multi-start procedure, where different points in the search space are 
probed with local search for high-quality solutions. Each iteration of GRASP 
consists of the construction of a randomized greedy solution, followed by lo- 
cal search, starting from the constructed solution. A high-level description of 
GRASP for QAP, i.e. solving minc(p) for p G Iln, is given in Algorithm 1. 



Algorithm 1 GRASP for minimization 
1: c •(— 00 

2: while stopping criterion not satisfied do 
3: p GreedyRandomized() 

4: p ■(— LocalSearch(p) 

5: if c{p) < c* then 

6: p* p 

7: c* <— c{p) 

8: end if 

9: end while 

10: return p* 



The greedy randomized construction and the local search used in the new al- 
gorithm are similar to the ones described in [12]. The construction phase consists 
of two stages. 

In stage 1, two initial assignments are made: facility Fi is assigned to location 
Lk and facility Fj is assigned to location Li. To make the assignment, elements 
of the distance matrix are sorted in increasing order: 

^ ^ ^i(n),j{n)j 

while the elements of the flow matrix are sorted in increasing order: 

®fc(l),/(l) ^ Qfe(2)J(2) > • • • > 0,k(n),l(n)- 

The product elements 

? ^fc(2),/(2) ■ ^i(2),j(2)? ^fc(n),/(n) ‘ ^i(n),j{n) 

are sorted and the term ak{q)^i{q) ■ bi(q)j(q) is selected at random from among the 
smallest elements. This product corresponds to the initial assignments: facility 
is assigned to location and facility is assigned to location Tj(q). 
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In stage 2, the remaining n — 2 assignments of facilities to locations are made, 
one facility/location pair at a time. Let 17 = {(zi, fci), (z 2 , ^ 2 ), . . • , {iq, kq)} denote 
the first q assignments made. Then, the cost assigning facility Fj to location L/ 
is Cj^i = ksQ To make the < 7 + 1-th assignment, select at random an 

assignment from among the feasible assignments with smallest costs and add the 
assignment to 17. 

Once a solution is constructed, local search is applied to it to try to improve 
its cost. For each pair of assignments {Fi — >■ Fj — >■ Li) in the current solution, 

check if the swap {Ft — >■ Lj; Fj — >■ Lk) improves the cost of the assignment. If so, 
make the swap, and repeat. A solution is locally optimal, when no swap improves 
the cost of the solution. 

Path-relinking [9] is an approach to integrate intensification and diversifi- 
cation in search. It consists in exploring trajectories that connect high-quality 
solutions. The trajectory is generated by introducing in the initial solution, at- 
tributes of the guiding solution. It was first used in connection with GRASP 
by Laguna and Marti [11]. A recent survey of GRASP with path-relinking is 
given in Resende and Ribeiro [16]. The objective of path-relinking is to integrate 
features of good solutions, found during the iterations of GRASP, into new solu- 
tions generated in subsequent iterations. In pure GRASP (i.e. GRASP without 
path-relinking), all iterations are independent and therefore most good solutions 
are simply “forgotten.” Path-relinking tries to change this, by retaining previous 
solutions and using them as “guides” to speed up convergence to a good-quality 
solution. 

Path-relinking uses an elite set P, in which good solutions found by the 
GRASP are saved to be later combined with other solutions produced by the 
GRASP. The maximum size of the elite set is an input parameter. During path- 
relinking, one of the solutions q G P is selected to be combined with the current 
GRASP solution p. The elements of q are incrementally incorporated into p. This 
relinking process can result in an improved solution, since it explores distinct 
neighborhoods of high-quality solutions. 

Algorithm 2 shows the steps of GRASP with path-relinking. Initially, the elite 
set P is empty, and solutions are added if they are different from the solutions 
already in the set. Once the elite set is full, path-relinking is done after each 
GRASP construction and local search. 

A solution q G P is selected, at random, to be combined, through path- 
relinking, with the GRASP solution p. Since we want to favor long paths, which 
have a better change of producing good solutions, we would like to choose an elite 
solution q with a high degree of differentiation with respect to p. Each element 
q G P, let d{q) denote the number of facilities in q and p that have different 
assignments, and let D = ^q^p d{q). A solution q is selected from the elite set 
with probability d{q)/D. The selected solution q is called the guiding solution. 
The output of path-relinking, r, is at least as good as solutions p and q, that 
were combined by path-relinking. 

If the combined solution r is not already in the elite set and its cost is not 
greater than cost of the highest-cost elite set solution, then it is inserted into 
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Algorithm 2 GRASP with path-relinking 
1: P ^ 0 

2: while stopping criterion not satisfied do 
3: p GreedyRandomized() 

4: p ■(— LocalSearch(p) 

5: if P is full then 

6: Select elite solution q £ P aX random 

7: r t— PathRelinking(p, q) 

8: if c(r) < max{c(g) | q £ P} and r ^ P then 

9: Let P' = {q £ P \ c{q) > c(r)} 

10: Let q' £ P' be the most similar solution to r 

11: P t— P U {r} 

12 : p^p\{q'} 

13: end if 

14: else 

15: if p 0 P then 

16: P t— P U {p} 

17: end if 

18: end if 

19: end while 

20: retnrn p* = min{c(p) | p £ P} 



the elite set. Among the elite set solutions having cost not smaller than c(r), the 
one most similar to r is deleted from the set. This scheme keeps the size of the 
elite set constant and attempts to maintain the set diversified. 

We next give details on our implementation of path-relinking for the QAP, 
shown in Algorithm 3. Let p be the mapping implied by the current solution and q 
the mapping implied by the guiding solution. For each location i = 1, . . . , n, path- 
relinking attempts to exchange facility p(i) assigned to location i in the current 
solution with facility q{i) assigned to i in the guiding solution. To maintain the 
mapping p feasible, it exchanges p{i) with p{k), where p{k) = q{i). 

The change in objective function caused by this swap is found using the 
function evali j , which is limited to the part of the objective function affected 
by these elements. If the change is positive, then the algorithm applies local 
search to the resulting solution. This is done only for positive changes in the 
objective value function to reduce the total computational time spent in local 
search. The algorithm also checks if the generated solution is better than the 
best known solution and, if so, saves it. 

The path-relinking procedure described above can be further generalized, by 
observing that path-relinking can also be done in the reverse direction, from 
the solution in the elite set to the current solution. This modification of the 
path-relinking procedure is called reverse path-relinking. In our implementation, 
a reverse path-relinking is also applied at each iteration. As a last step, we use a 
post-optimization procedure where path-relinking is applied among all solutions 
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Algorithm 3 Path-relinking 

Require: p, the current GRASP solution; q, the guiding solution 
1: c* 00 

2: for it— 1, . . . , n do 

3: if p(i) 7 ^ q{i) then 

4: Let j be such that p{j) = q{i) 

5: 5 t- evalij(p, i, j) 

6: r t— p{i) 

7: pii)<-p{j) 

8: p{j) t- T 

9: if 5 > 0 then 

10: r t— LocalSearch(p) 

11: if c(r) < c* then 

12: r* 

13: end if 

14: end if 

15: end if 

16: end for 
17: return r* 



of the elite set. This procedure, which can be viewed as an extended local search, 
is repeated while an improvement in the best solution is possible. 

One of the computational burdens associated with path-relinking is the lo- 
cal search done on all new solutions found during path-relinking. To ameliorate 
this, we modified the local search phase proposed in GRASP [12] by using a 
non-exhaustive improvement phase. In the local search in [12], each pair of as- 
signments was exchanged until the best one was found. In our implementation, 
only one of the assignments is verified and exchanged with the one that brings 
the best improvement. This reduces the complexity of local search by a factor of 
n, leading to a O(n^) procedure. This scheme is used after the greedy randomized 
construction and at each iteration during path-relinking. 

To enhance the quality of local search outside path-relinking, after the mod- 
ified local search discussed above is done, the algorithm performs a random 
3-exchange step, equivalent to changing, at random, two pair of elements in the 
solution. The algorithm then continues with the local search, until a local opti- 
mum is found. This type of random shaking is similar to what is done in variable 
neighborhood search [13]. 

3 Computational Experiments 

Before we present the results, we first describe a plot used in several of our 
papers to experimentally compare different randomized algorithms or different 
versions of the same randomized algorithm [1,3,7]. This plot shows empirical 
distributions of the random variable time to target solution value. To plot the 
empirical distribution, we fix a solution target value and run each algorithm T 
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prob: tho30 (target value = 149936) 




time-to-target-value (seconds on an SGI Challenge 196MHz R10000) 

Fig. 1. Probability distribution of time-to-target-value on instance tho30 from 
QAPLIB for GRASP and GRASP with path-relinking. 



independent times, recording the running time when a solution with cost at least 
as good as the target value is found. For each algorithm, we associate with the 
f-th sorted running time (t^) a probability Pi = {i — ^)IT, and plot the points 
Zi = (ti,pi), for i = 1, ... ,T. Figure 1 shows one such plot comparing the pure 
GRASP with the GRASP with path-relinking for QAPLIB instance tho30 with 
target (optimal) solution value of 149936. The figure shows clearly that GRASP 
with path-relinking (GRASP-I-PR) is much faster than pure GRASP to find a 
solution with cost 149936. For instance, the probability of finding such a solution 
in less than 100 seconds is about 55% with GRASP with path-relinking, while it 
is about 10% with pure GRASP. Similarly, with probability 50% GRASP with 
path-relinking finds such a target solution in less than 76 seconds, while for pure 
GRASP, with probability 50% a solution is found in less than 416 seconds. 

In [3], Aiex, Resende, and Ribeiro showed experimentally that the distri- 
bution of the random variable time to target solution value for a GRASP is a 
shifted exponential. The same result holds for GRASP with path-relinking [2]. 
Figure 2 illustrates this result, depicting the superimposed empirical and theo- 
retical distributions observed for one of the cases studied in [3]. 

In this paper, we present extensive experimental results, showing that path- 
relinking substantially improves the performance of GRASP. We compare an 
implementation of GRASP with and without path-relinking. The instances are 
taken from QAPLIB [4], a library of quadratic assignment test problems. 
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time to target value (seconds) 



Fig. 2. Superimposed empirical and theoretical distributions (times to target values 
measured in seconds on an SGI Ghallenge computer with 196 MHz RIOOOO processors). 



For each instance considered in our experiments, we make T = 100 inde- 
pendent runs with GRASP with and without path-relinking, recording the time 
taken for each algorithm to find the best known solution for each instance. (Due 
to the length of the runs on a few of the instances, fewer than 100 runs were 
done.) The probability distributions of time-to-target-value for each algorithm 
are plotted for each instance considered. We consider 91 instances from QAPLIB. 
Since it is impractical to fit 91 plots in this paper, we show the entire collection 
of plots at the URL http://www.research.att.com/~mgcr/exp/gqapspr. In 
this paper, we show only a representative set of plots. 

Table 3 summarizes the runs in the representative set. The numbers appear- 
ing in the names of the instances indicate the dimension (n) of the problem. 
For each instance, the table lists for each algorithm the number of runs, and the 
times in seconds for 25%, 50%, and 75% of the runs to find a solution having 
the target value. 

The distributions are depicted in Figures 1 and 3 to 9. 

The table and figures illustrate the effect of path-relinking on GRASP. On 
all instances, path-relinking improved the performance of GRASP. The improve- 
ment went from about a factor of two speedup to over a factor of 60. 
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Table 1. Summary of experiments. For each instance, the table lists for each algorithm, 
the number of independent runs, and the time (in seconds) for 25%, 50%, and 75% of 
the runs to find the target solution value. 



problem 


runs 


GRASP 
25% 50% 


75% 


GRASP with PR 
runs 25% 50% 75% 


esc32h 


100 


.5 


1.4 


2.5 


100 


.2 


.5 


1.0 


bur26h 


100 


2.5 


1.4 


2.5 


100 


.7 


1.4 


2.8 


kra30a 


100 


47 


115 


241 


100 


11 


26 


57 


tho30 


100 


208 


410 


944 


100 


30 


76 


154 


nug30 


100 


583 


1334 


2841 


100 


63 


149 


283 


chr22a 


100 


723 


1948 


4188 


100 


234 


449 


726 


lipa40a 


75 


12,366 23,841 


39,649 


100 


360 


526 


708 


ste36a 


17 


27,034 91,075 135,011 


100 


1787 4047 8503 



prob: esc32h 




time to target value (seconds on an SGI Challenge 1 96MHz R1 0000) 



Fig. 3. Probability distribution of time-to-target-value on instance esc32h from 
QAPLIB for GRASP and GRASP with path-relinking. 
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prob: bur26h 




Fig. 4. Probability distribution of time-to-target-value on instance bur26h from 
QAPLIB for GRASP and GRASP with path-relinking. 



prob: kraSOa 




Fig. 5. Probability distribution of time-to-target-value on instance kraSOa from 
QAPLIB for GRASP and GRASP with path-relinking. 
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Fig. 6. Probability distribution of time-to-target-value on instance nugSO, from 
QAPLIB for GRASP and GRASP with path-relinking. 



prob: nugSO (target value = 6124) 
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Fig. 7 . Probability distribution of time-to-target-value on instance chr22a from 
QAPLIB for GRASP and GRASP with path-relinking. 
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Fig. 8. Probability distribution of time-to-target-value on instance lipa40a from 
QAPLIB for GRASP and GRASP with path-relinking. 
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Fig. 9. Probability distribution of time-to-target-value on instance ste36a from 
QAPLIB for GRASP and GRASP with path-relinking. 
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4 Concluding Remarks 

In this paper, we propose a GRASP with path-relinking for the quadratic as- 
signment problem. The algorithm was implemented in the ANSI-C language 
and was extensively tested. Computational results show that path-relinking 
speeds up convergence, sometimes by up to two orders of magnitude. The 
source code for both GRASP and GRASP with path-relinking, as well as 
the plots for the extended experiment, can be downloaded from the URL 
http : //www .research. att . com/~mgcr/ exp/gqapspr. 
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Abstract. The minimum transmission radius R that preserves ad hoc network 
connectivity is equal to the longest edge in the minimum spanning tree. This 
article proposes to use the longest LMST (local MST, recently proposed mes- 
sage free approximation of MST) edge to approximate R using a wave propaga- 
tion quazi-localized algorithm. Despite small number of additional edges in 
LMST with respect to MST, they can extend R by about 33% its range on net- 
works with up to 500 nodes. We then prove that MST is a subset of LMST and 
describe a quazi-localized scheme for constructing MST from LMST. The algo- 
rithm eliminates LMST edges which are not in MST by a loop breakage proce- 
dure, which iteratively follows dangling edges from leaves to LMST loops, and 
breaks loops by eliminating their longest edges, until the procedure finishes at a 
single leader node, which then broadcasts R to other nodes. 



1 Introduction 

Due to its potential applications in various situations such as battlefield, emergency 
relief, environment monitoring, etc., wireless ad hoc networks have recently emerged 
as a prime research topic. Wireless networks consist of a set of wireless nodes which 
are spread over a geographical area. These nodes are able to perform processing and 
are capable communicating with each other by means of a wireless ad hoc network. 
Wireless nodes cooperate to perform data communication tasks, and the network may 
function in both urban and remote environments. 

Energy conservation is a critical issue in wireless networks for the node and the 
network life, as the nodes are powered by batteries only. Each wireless node typically 
has transmission and reception processing capabilities. To transmit a signal from a 
node A to other node B, the consumed power (in the most common power-attenuation 
model) is proportional to ||AB||“h-c, where ||AB|| is the Euclidean distance between A 
and B\ a is a real constant between 2 and 5 which depends on the transmission envi- 
ronment, and constant c represents the minimal power to receive a signal, and power 
to store and then process that signal. Eor simplicity, this overhead cost can be inte 
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grated into one cost, which is almost the same for all nodes. The expression represents 
merely the minimal power, assuming that the transmission radius is adjusted to the 
distance between nodes. While adjusting transmission radius is technologically feasi- 
ble, medium access layer (e.g. the standard IEEE 802.11) works properly only when 
all nodes use the same transmission radius. Otherwise hidden terminal problem is 
more difficult to control and magnifies its already negative impact on the network 
performance. 

Ad hoc networks are normally modeled by unit graphs, where two nodes are con- 
nected if and only if their distance is at most R, where R is the transmission radius, 
equal for all nodes. Finding the minimum R that preserves connectivity is an impor- 
tant problem for network functionality. Larger than necessary values of R cause 
communication interference and consumption of increased energy, while smaller val- 
ues of R may disable data communication tasks such as routing and broadcasting. 

The two objectives that have been mainly considered in literature are: minimizing 
the maximum power at each node (as in this paper) and minimizing the total power 
assigned if transmission ranges can be adjusted (for example, [11] shows that MST 
used as nonuniform power assignment yields a factor of 2 approximation. Instead of 
transmitting using the maximum possible power, nodes in an ad hoc network may 
collaboratively determine the optimal common transmission power. It corresponds to 
the minimal necessary transmission radius to preserve network connectivity. It was 
recognized [7,8,10] that the minimum value of R that preserves the network connec- 
tivity is equal to the longest edge in the minimum spanning tree (MST). However, all 
existing solutions for finding R rely on algorithms that require global network knowl- 
edge or inefficient straightforward distributed adaptations of centralized algorithms. 
Therefore almost all existing solutions for energy-efficient broadcast communications 
are globalized, meaning that each node needs global network information. 

Global network information requires huge communication overhead for its mainte- 
nance when nodes are mobile or have frequent changes between active and sleeping 
periods. Localized solutions are therefore preferred. In a localized solution to any 
problem, the node makes (forwarding or structure) decisions solely based on the posi- 
tion of itself and its neighboring nodes. In addition, it may require constant amount of 
additional information, such as position of destination in case of routing. In case of 
LMST, both construction and maintenance are fully localized. In some cases (such as 
MST), however, local changes to a structure may trigger changes in a different part of 
the network, and therefore could have global impact to a structure. We refer to such 
protocols are being quazi-localized, if any node in the update process still makes deci- 
sion based on local information, but a ‘wave’ of propagation messages may occur. 

Nodes in ad hoc network are assumed to either know their geographic position 
(using, for example, GPS), or are able to determine mutual distances based on signal 
strength or time delays. This assumption is in accordance with literature. 

Li, Hou and Sha [4] recently proposed a localized algorithm to approximate MST. 
The algorithms constructs local minimal spanning tree, where each node finds MST 
of the subgraph of its neighbors, and an edge is kept in LMST (localized minimal 
spanning tree) if and only if both endpoints have it in their respective local trees. In 
this article, we propose to use the longest edge in LMST edge to approximate R using 
a wave propagation quazi-localized algorithm. The wave propagation algorithm is 
adapted from [9], where it was used for leader election. We simply propagate the 
longest LMST edge instead of propagating the winning leader information. 
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In order to determine whether the longest LMST edge is a reasonably good ap- 
proximation of desired R, we study some characteristics of LMST in two dimensions 
(2D) and three dimensions (3D). We observed that, although the number of additional 
edges in LMST respect to MST is very small (under 5% of additional edges), they tend 
to be relatively large edges, and can extend R by about 33% of its range on networks 
with up to 500 nodes. 

In some applications, such as mesh networks for wireless Internet access, or sensor 
networks for monitoring environment, the nodes are mostly static and network does 
not change too frequently (which is the case if mobility is involved). The increased 
transmisison range by 33% may easily double the energy expenditure, depending on 
constants a and c in the energy consumption model, and increases interference with 
other traffic. On the other hand, the increased, the larger value for R provides redun- 
dancy in routing, which is useful especially in a dynamic setting. These observations 
about the structure of LMST and increased power consumption motivated us to design 
an algorithm for constructing MST topology from LMST topology, without the aid of 
any central entity. The proposed algorithm needs less than 7 messages per node on 
average (on networks up to 500 nodes). It eliminates LMST edges which are not in 
MST by a loop breakage procedure, which iteratively follows dangling edges from 
leaves to LMST loops, and breaks loops by eliminating their longest edges, until the 
procedure finishes at a single node (as a byproduct, this single node can also be con- 
sidered as an elected leader of the network). This so elected leader also learns longest 
MST edge in the process, and may broadcast it to other nodes. 

We made two sets of experiments (using Matlab environment) for the MST con- 
struction from the LMST. In one scenario, nodes are static and begin constructing 
MST from LMST more or less simultaneously. In the second set of experiments, we 
study the maintenance of already constructed MST when a new node is added to the 
network. 

This paper is organized as follows: Section 2 presents the related work for the 
stated problem. In section 3 we present some characteristics of MST and LMST ob- 
tained by experiments, for 2D and 3D. The main characteristics of interest to this 
study are the lengths of the longest MST and LMST edges. In section 4, we describe 
the adaptation of wave propagation protocol [3] for disseminating R throughout the 
network. The algorithm for constructing MST from the LMST is explained in detail in 
section 5. Section 6 gives performance evaluation of the algorithm for constructing 
MST from LMST. Section 7 describes an algorithm for updating MST when a single 
node is added to the network, and gives results of its performance evaluation. Section 
8 concludes this paper and discusses relevant future work. 



2 Related Work 

In [2], Dai and Wu proposed three different algorithms to compute the minimal uni- 
form transmission power of the nodes, using Area-based binary search. Prim’s Mini- 
mum Spanning Tree, and it’s extension with Fibonacci heap implementation. How- 
ever, all solutions are globalized, where each node is assumed to have full network 
information (or centralized, assuming a specific station has this information and in- 
forms network nodes about MST). We are interested in quazi-localized algorithm, 
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where each node uses only local knowledge of its 1-hop neighbors, and the communi- 
cation propagates throughout the network until MST is constructed. 

In [6], Narayanaswamy et al. presented a distributed protocol that attempts to de- 
termine the minimum common transmitting range needed to ensure network connec- 
tivity. Their algorithm runs multiple routing daemons (RDs), one at each power level 
available. Each RD maintains a separate routing table where the number of entries in 
the routing table gives the number of reachable nodes at that power level. The node 
power level is set to be the smallest power level at which the number of reachable 
nodes is the same as that of the max power level. The kernel routing table is set to be 
the routing table corresponding to this power level. The protocol apparently requires 
more messages per each node, and at higher power levels, than the protocol presented 
here. 

Penrose [7] [8] investigated the longest edge of the minimal spanning tree. The 
critical transmission range for preserving network connectivity is the length of the 
longest edge of the Euclidean MST [7] [8] [10]. The only algorithm these articles offer 
is to find MST and then its longest edge, without even discussing the distributed im- 
plementation of the algorithm. 

Santi and Blough [10] show that, in two and three dimensions, the transmitting 
range can be reduced significantly if weaker requirements on connectivity are accept- 
able. Halving the critical transmission range, the longest connected component con- 
tains 90% of nodes, approximately. This means that a considerable amount of energy 
is spent to connect relatively few nodes. 

A localized MST based topology control algorithm for ad hoc networks was pro- 
posed in [4] by Li, Hou and Sha. Each node u first collects the positions of its one-hop 
neighbours Nl(u). Node u then computes the minimum spanning tree MST(Nl(u)) of 
Nl(u). Node u keeps a directed edge uv in LMST if and only if uv is also an edge in 
MST(Nl(v)). If each node already has 2-hop neighbouring information, the construc- 
tion does not involve any message exchange between neighboring nodes. Otherwise 
each node contacts neighbors along its LMST link candidates, to verify the status at 
other node. The variant with the union of edge candidates rather than their common 
intersection is also considered in [4], possibly leading to a directed graph (no message 
exchange is then needed even with 1-hop neighbour information). In [5], Li et al. 
showed that LMST is a planar graph (no two edges intersect). Then they extended the 
LMST definition to k-hop neighbours, that is, the same construction but with each 
node having more local knowledge. They also prove that MST is subset of 2-hop 
based LMST, but not that MST is a subset of 1-hop based LMST considered in this ar- 
ticle. We observed, however, on their diagrams that LMST with 2-hop and higher lo- 
cal knowledge was mostly identical to the one constructed with merely 1-hop knowl- 
edge, and decided to use only that limited knowledge, therefore conserving the 
communication overhead needed to maintain k-hop knowledge. 

In [3], Dulman et al. proposed a wave leader election protocol. Each node is as- 
signed an unique ID from an ordered set. Their algorithm selects as leader the node 
with minimum ID. In the wave propagation algorithm [3], each node maintains a rec- 
ord of the minimum ID it has seen so far (initially its own). Each node receiving a 
smaller ID than the one it kept as currently smallest updates it before the next round. 
In each round, each node that received smaller ID in the previous round will propa- 
gate this minimum on all of its outgoing edges. After a number of rounds, a node 
elects itself as leader if the minimum value seen in this node is the node’s own ID; 
otherwise it is a non-leader. 
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We will apply FACE routing algorithm [1] in our protocol for converting LMST 
into an MST. FACE routing guarantees delivery and needs a planar graph to be ap- 
plied. Starting from source node, faces that intersect imaginary line from source to 
destination are traversed. The traversal of each face is made from the first intersecting 
edge (with mentioned imaginary line) to the second one. Reader is referred to [1] for 
more details. 



3 Comparing Longest Edges of MST and LMST 

Theorem 1. MST is a subset of LMST. 

Proof. The well known Kruskal’s algorithm for constructing MST sorts all edges in 
the increasing order, and considers these edges one by one for inclusion in MST. MST 
initially has all vertices but no edges. An edge is included into already constructed 
MST if and only if its addition does not create a cycle in the already constructed MST. 
Let LMST(A ) be the minimal spanning trees constructed from n(A ), which is set con- 
taining A and its 1-hop neighbors. We will show that if an edge from MST has end- 
points in n(A) then it belongs to LMST(A). Suppose that this is not correct, and let e be 
the shortest such edge. LMST(A) may also be constructed by following the same 
KruskaTs algorithm. Thus edges from A to its neighbors and between neighbors of A 
are sorted in the increasing order. They are then considered for inclusion in LMST(A ). 
Thus, when e is considered, since it is not included in LMST(A ), it creates a cycle in 
LMST(A ). All other edges in the cycle are shorter than e. Since e is in MST, at least 
one of edges from the cycle cannot be in MST, but is in LMST(A ) since it was already 
added to it. However, this contradicts the choice of e being the shortest edge of MST 
not being included in LMST(A ). Therefore each edge AB from MST belongs to both 
LMST(A) and LMST(B), and therefore to LMST. ♦ 

We are interested in the viability of using the LMST topology for approximating 
the minimal transmission radius R. Matlab was used to derive some characteristics of 
the LMST topology in 2D and in 3D. We generate unit graph of n nodes (« = 10, 20, 
50, 100, 200 and 500), each randomly distributed over an area of 1 x 1 for the 2D case 
and over a volume of 1 x 1 x 1 for the 3D case. The following characteristics (some of 
them are presented for possible other applications) of LMST and MST were compared: 

• Average degree (average number of neighbors for each node) 

• Average maximal number of neighbors 

• Percentage of nodes which have degrees 1, 2, 3, 4, 5, degree > 5 

• Highest degree of a node ever found in any of tests 

• Average Maximal radius 

• Standard deviation of average maximal radius 

For each n we ran 200 tests in order to have more confident results. The following 
tables show the obtained results. It is well known that the average degree (average 
number of neighbors per node) of an MST with n nodes is always 2-2/n, since it has n- 
1 edges and n nodes. This value is entered in tables bellow. Table 1 shows the results 
for 500 nodes (similar data are obtained for other values of ri). Note that LMST has 
<5% more edges than MST. 
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Table 1. MST vs LMST for 500 nodes 





MST 2D 


LMST 2D 


MST 3D 


LMST 3D 


Average Degree 


1.996 


2.114 


1.996 


2.192 


Average Max Number of Neighbors 


3.900 


3.900 


4.400 


4.500 


Highest Degree 


4.000 


4.000 


5.000 


5.000 


Average Maximal Radius 


0.081 


0.103 


0.177 


0.214 


Std Deviation Maximal Radius 


0.010 


0.005 


0.014 


0.017 



Table 2 presents the ratios of the longest MST and LMST edges, for various num- 
bers of nodes. The ratio is always >0.75. This means that, on average, longest LMST 
edge may have about 1/3 longer length than the longest MST edge. It may lead to 
about twice as much additional energy for using the longest transmission radius from 
LMST instead of MST. Such discovery motivated us to design a procedure for con- 
verting LMST into an MST. 



Table 2. Ratio of Average Maximal Radius of MST and LMST 



N 


MST/LMST Average 
Maximal Radius, 2D 


MST/LMST Average 
Maximal Radius, 3D 


10 


0.9392 


0.9421 


20 


0.8635 


0.8650 


50 


0.7923 


0.8043 


100 


0.7944 


0.7978 


200 


0.7515 


0.7588 


500 


0.7864 


0.8271 



As we can see from table 1, the maximum degree of any node obtained for LMST 
in all the tests was 5. This means that LMST maintains a relatively low degree inde- 
pendently on the size of the network and its density (our study is based on maximal 
density, or complete graphs). Since the area where nodes were placed remained fixed, 
and more nodes were placed, the maximal transmission radius was decreasing when 
number of nodes was increasing. 



Table 3. Percentage of nodes which have degree 1, 2, 3, 4, 5 and > 5 for the 2d MST 



Degree 


1 


2 


3 


4 


5 


>5 


N = 10 


35.5 % 


49.5 % 


14.5 % 


0.5 % 


0 


0 


o 

II 

Z 


28.75 % 


53 % 


17.75 % 


0.5 % 


0 


0 


N = 50 


23.8 % 


56.8 % 


19% 


0.4 % 


0 


0 


o 

o 

II 

Z 


22.8 % 


57.05 % 


19.5 % 


0.65 % 


0 


0 


N = 200 


22.48 % 


56.7 % 


20.18 % 


0.65 % 


0 


0 


N = 500 


22.1 % 


55.9 % 


21.2 % 


0.8 % 


0 


0 



Tables 3-6 show the percentages of nodes which have degree 1, degree 2, degree 3, 
degree 4, degree 5, and degree >5, for the MST and the LMST. It can be observed that 
about half nodes have degree two, and <2% of nodes in 2D have degree >3 and <5% 
of nodes in 3D have degree >3. 
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Table 4. Percentage of nodes which have degree 1, 2, 3, 4, 5 and > 5 for the 2d LMST 



Degree 


1 


2 


3 


4 


5 


>5 


N = 10 


27 % 


55.5 % 


16% 


1.5 % 


0 


0 


N = 20 


20.75 % 


54% 


24.25 % 


1 % 


0 


0 


N = 50 


16.3 % 


58.8 % 


23.9 % 


1 % 


0 


0 


N = 100 


16% 


58.75 % 


24.4 % 


0.85 % 


0 


0 


N = 200 


15.82 % 


58.7 % 


24.62 % 


0.85 % 


0 


0 


N = 500 


15.52 % 


58.64 % 


24.82 % 


1.02 % 


0 


0 


Table 5. Percentage of nodes which have degree 1, 2, 


3, 4, 5 and > 


5 for the 3d 


MST 


Degree 


1 


2 


3 


4 


5 


>5 


N = 10 


35.5 % 


50% 


13.5 % 


1 % 


0 


0 


N = 20 


35.75 % 


40.75 % 


21.25 % 


2.25 % 


0 


0 


N = 50 


31.2 % 


45.2 % 


20% 


3.6% 


0 


0 


N = 100 


30.65 % 


44.3 % 


21.55 % 


3.4% 


0.1 % 


0 


N = 200 


29.02 % 


46.05 % 


21.98 % 


2.8 % 


0.15 % 


0 


N = 500 


28.02 % 


46.07 % 


22.02 % 


3.67 % 


0.22 % 


0 


Table 6. Percentage of nodes which have degree 1, 2, 


3, 4, 5 and > 5 


! for the 3d LMST 


Degree 


1 


2 


3 


4 


5 


>5 


N = 10 


26% 


55 % 


17 % 


2 % 


0% 


0 


N = 20 


23.75 % 


47.25 % 


24.5 % 


4.25 % 


0.25 % 


0 


N = 50 


22 % 


46.8 % 


26% 


5.2% 


0% 


0 


N = 100 


21.55 % 


45.95 % 


27.7 % 


4.65 % 


0.15 % 


0 


N = 200 


18.85 % 


47.67 % 


29.07 % 


4.17 % 


0.23 % 


0 


N = 500 


17.34 % 


48.12 % 


31.03% 


3.24 % 


0.27 % 


0 



4 Wave Propagation Quasi-localized Algorithm for Finding 
Transmission Radius from the Longest LMST Edge 

We adapt the wave propagation leader election algorithm [3] for the use in finding the 
longest LMST edge. Our basic idea is to substitute the node ID with the longest edge 
adjacent to each node in its LMST topology. Each node maintains a record of the 
longest edge it has seen so far (initially its own longest edge in its LMST). In each 
round, each node receiving larger edge in the previous round will broadcasts its new 
longest edge. At end, all nodes will receive the same longest edge, which will be used 
as transmission radius. One of drawbacks, or perhaps advantages, of given protocol is 
that a node does not know when the wave propagation process is finished. It is draw- 
back in the sense that it may not use the proper transmission radius, same for all 
nodes, but a smaller one. However, this smaller transmission radius will still preserve 
network connectivity, since it is not equal to all nodes. It is advantage in the sense that 
it is a very simple protocol, and can be an ongoing process with dynamic ad hoc net- 
works. It is straightforward to apply it when an LMST edge has increased over current 
transmission radius R, in which case this new value can be propagated. However, 
when an edge that was equal to R has decreased, the process of reducing R in the net- 
work is not straightforward, since the length of the second longest edge was not pre- 
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served with wave propagation algorithm. To address this issue, k longest LMST edges 
may be maintained, and the message to use the next smaller value is broadcast from 
the neighbourhood of the event. The alternative is to initiate new wave propagation 
from a node detecting the problem with edge that decided currently recognized R. 

We did not implement this protocol, since it does not significantly differ from one in 
[3]. The reader can see that article about its performance. Most importantly, the num- 
ber of messages per node was <7 in all measurements, done in somewhat different 
settings, with denser graphs than LMSTs. Therefore we can expect much lower mes- 
sage count per node in our application. 






MST 



1 




1 

0 0 LMST 



Fig. 1. MST and LMST comparison for 200 nodes 

Figure 1 illustrate MST and LMST graphs for n= 200 nodes, in 2D and 3D. These fig- 
ures helped us in gaining an insight on how to construct efficiently MST from LMST. 



5 Constructing MST from LMST in Ad Hoc Networks 

We can observe (see Fig. 1) that the only differences between the LMST and the MST 
are the loops that appear in the LMST but not in MST, created by edges present in 
LMST but not in MST. Our main idea is to somehow ‘break’ the LMST loops in order 
to obtain the MST. 
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We are now ready to describe our proposed scheme for converting LMST into 
MST. It consists of several iterations. Each iteration consists of two steps: traversing 
or eliminating dangling (tree) edges, and breaking some loops. These two steps repeat 
until the process ends in a node. That node is, as a byproduct, network leader, and 
learns longest MST edge in the process (it also may learn the longest LMST edge). 
The value of the longest MST edge can then be broadcast to other network nodes. 
Details of this process are as follows. 

Tree step. Each leaf of LMST initiates a broadcast up the tree. Each node receiving 
such upward messages from all but one neighbor declares itself as dangling node and 
continues with upward messages. All edges traversed in this way belong to MST. 
Each such traversed subtree at end decides what is its candidate for R (maximal ra- 
dius of MST). This tree advance will stop at an LMST loop, at a node that will be 
called the breaking node, because of its role in the sequel. Figure 2 illustrates this tree 
step. Traversed dangling nodes and edges are shown by arrows. More precisely, 
breaking node is a node that receives at least one message from dangling node/edge 
and, after some predefined timeout, remains with two or more neighbors left. Break- 
ing nodes are exactly nodes that initiate the loop step, the second of the two steps that 
are repeating. Note that, an alternate choice is that all nodes on any loop become 
breaking nodes. We did not select this option since, on average, it would generate 
more messages, and our goal is to reduce message count. However, in some special 
cases, LMST may not have any leaf (e.g. LM5T being a ring). In this case, node(s) that 
decided to create MST may declare themselves as breaking nodes after certain time- 
out, if no related message is received in the meanwhile. In some special cases, such 
as sensor network trained from the air or a monitoring station, signals to create MST 
may arrive externally as part of training. 

Loop step. Each of the breaking nodes from the previous step initiates the loop tra- 
versal to find and break the longest edge in the loop. Consider breaking node A on one 
such loop. It has, in general, two neighbors on the loop, and edges AB and AC. Note 
that in some special cases breaking node may also be branching node, that is, it could 
have more than two neighbors; in that case consider clockwise and counterclockwise 
neighbors with respect to incoming dangling edge. Also, in general, breaking node A 
belongs to two loops (that is, two faces of considered planar graph). Let ||AB|| > ||AC||. 
Node A will select direction AC to traverse the cycle, that is, the direction of shorter 
edge. The message that started at A will advance using FACE routing algorithm [1]. 
Dangling edges from previous tree steps are ignored in the loop steps. If a branching 
node is encountered, the traversal splits into two traversals, one of each face of edges 
traversed so far. The advance will stop at a node D whose following neighbor E is 
such that ||AZ?|| < UDEH. This means that a longer edge in the cycle is detected, and AB 
is not eliminated. Node D is then declared as the new breaking node, and starts the 
same longest edge verification algorithm. If the message returns to A then the longest 
edge DE from the loop is eliminated. Endpoints D and E of each such broken LMST 
edges follow then step 1, upward tree climbing, until they reach another LMST cycle. 
This process continues until all upward tree messages meet at a single node, which 
means that MST is constructed. 

The described algorithm has several byproducts, in addition to constructing MST. 
The last node in the construction can be selected as the leader in the network, espe- 
cially because it is expected to be somehow centrally positioned. Next, this leader 
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node may, in the process, learn the longest LMST and longest MST edges, and may 
initiate simple broadcasting algorithm about these obtained values, which will be used 
as transmission radius for the whole network. This is especially needed for MST, as 
part of algorithm to find minimum transmission radius, and inform nodes about it. In 
case of LMST, we already observed that wave propagation algorithm can be used in- 
stead immediately. 




Node) 



Theorem 2. The tree obtained from LMST by applying the described loop breakage 
algorithm correctly is MST. 

Proof. Planarity of LMST [5] shows that it consists of well defined faces and therefore 
loop breakage process does create tree at end, by ‘opening’ up all closed faces. Theo- 
rem 1 proves that MST is a subset of LMST. The algorithm described above clearly 
breaks every loop, by eliminating its longest edge. Suppose that the tree obtained at 
the end of the process is not MST. Since both trees have equal number of edges, let e 
be the shortest MST edge that was not included in the tree. Edge e was eliminated at 
some point, since it was the longest edge in a loop of LMST. Subsequently, more 
edges from that loop may be eliminated, thus increasing the length of loop that would 
have been created if e was returned to the graph. However, in all these subsequent 
loops, including one at the very end of process, e remains the longest edge, since sub- 
sequent edges, from loops that would contain e, are originally shorter than e, but al- 
ways longest among new edges that appears in the loop when they are eliminated. 
Since e is in MST, at least one other edge / from that final loop is not in MST. This 
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edge /is shorter than e as explained. However, if e is replaced by/ the graph remains 
connected and remains a tree, with overall smaller weight than MST. This is a contra- 
diction. Therefore the tree that remains at end is indeed an MST. ♦ 



6 Performance Evaluation of the Algorithm for Constructing 
Minimal Spanning Trees 

We consider a network of n nodes, with randomly distributed nodes over an area of 1 
X 1. We constructed the LMST for several values of n (n = 10, 20, 60, 100, 200, 300 
and 500), with 200 generated networks for each n. The described algorithm was 
simulated, and we measured the following characteristics: 

• Average number of messages in the network, for constructing MST from LMST 

• Maximal number of messages in any of generated networks 

• Minimal number of messages in any of generated networks 

• Average number of messages per node 

• Standard deviation of message counts per node 

• Average number of iterations (that is, how many times loop step was applied) 



Table 7. Message counts in the algorithm that constmcts MST from LMST 



Node 

s 


Average 
number of 
messages 


Max 

number of 
messages 


Min num- 
ber of 
messages 


Mean of 
messages 
per node 


Std of 
messages 
per node 


Average 
Number of 
Iterations 


10 


16.10 


34 


9 


1.61 


0.73 


0.120 


20 


48.64 


81 


19 


2.43 


1.37 


0.700 


60 


257.16 


1083 


108 


4.29 


3.08 


1.860 


100 


518.94 


1381 


253 


5.19 


4.01 


4.660 


200 


1265.25 


2161 


717 


6.33 


5.11 


8.120 


300 


1952.35 


2695 


1462 


6.51 


5.13 


8.900 


500 


3490.75 


3957 


2473 


6.98 


5.80 


13.10 



It appears that the average number of messages per node is approximately logjn-2. 
Therefore it increases with the network size, but does it very slowly. It is not surpris- 
ing since MST is a global structure, where change in one part of the network has im- 
pact on the decision made in other part of the network, and this happens at various 
levels of hierarchy. 



7 Updating MST after Adding One Node 

Mobility of nodes, or changes in node activity status, will cause changes in MST to- 
pology. We will now design a simple algorithm for updating MST when a new node is 
added to the network. The added node first constructs its own LMST, that is, MST of 
itself and its 1-hop neighbors. There are two cases to consider. Simpler case is when 
such LMST contains only one edge. In this case, the edge is added to MST, and no 
further updates are needed. 
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The case when the LMST at given node contains more than one edge is non- 
trivial, and requires a procedure for loop breaking to find and eliminate longest 
edges in newly created cycles. For example, in Fig. 3, LMST of new node has three 
edges. The update procedure assumes that MST is organized as a tree, rooted at the 
leader found in the construction process. This tree can be constructed during MST 
construction algorithm, or leader can additionally construct or complete the tree 
while informing about the length of longest edge in MST. All edges in MST are ori- 
ented toward the leader. In this way, branching is completely avoided in the tra- 
versal. The decision to use leader in the update process is made in order to avoid 
traversing long open LMST face (in MST, this open face contains all nodes and 
edges of MST; in fact each edge is found twice on that open face) unnecessarily and 
with each non-trivial node addition. 




Thus, if LMST at added node contains k>l neighbors, k traversals toward the 
leader are initiated. Each of traversals records the longest edge along the path trav- 
ersed. Each of the traversals stops at the leader node. However, some branching nodes 
may receive two copies of such traversal messages. These nodes recognize comple- 
tion of a loop, and can make decision about longest loop edge to be eliminated. In 
example in Fig. 3, the traversal that starts to the left of new node will end at the leader 
node, with each intermediate node being visited once only. The other two will ‘meet’ 
at indicated branching node B. Such node B may forward only one of traversals to- 
ward the leader, and may stop the second incoming traversal. Starting with k travers- 
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als, k-1 loops will be recognized, each at leader node or an interim branching node. 
These nodes may learn the longest edge in the process, and may send backward mes- 
sages toward it to ask for breaking the edge. 

The described algorithm is implemented. After 100 tests we measured that the av- 
erage number of messages in the network with 200 nodes for updating MST when one 
node is added was 34.8. Therefore it appears that the MST construction with synchro- 
nous start from LMST, requiring less than 7 messages per node, leads to significant 
communication savings. 



8 Conclusions and Future Work 



LMST is a message free localized structure in ad hoc networks, which contains MST 
as a subset and which has less that 5% additional edges not already contained in MST. 
We proposed to use the longest LMST edge to approximate the minimal transmission 
radius R, whose exact value is the longest edge in the MST. We compared some char- 
acteristics of LMST and MST. The average degree of the MST is =1.94 compared to 
the =2.06 obtained with the LMST in the 2D case. In the 3D case the degree for the 
MST is =1.94 and the degree for the LMST is =2.12. Also we noted that most of the 
nodes had degree two or three. The longest LMST edge can be spread throughout the 
network by applying a wave propagation algorithm, previously proposed to be used as 
an leader election algorithm. 

Existing MST construction algorithms were based on global knowledge of the net- 
work, or on some operations that, in distributed implementations, were not performed 
between neighboring nodes. The main novelty of our proposed scheme is that all the 
communication was restricted between neighboring nodes; therefore the message 
count is realistic one. To design the new MST construction algorithm, we observed 
that the difference between LMST and MST is in some loops present in LMST, and 
that the number of these loops was not large. The construction was based on ‘break- 
ing’ these loops in iterations, with MST edges being recognized between iterations. 
The proposed algorithm appears to have logarithmic (in number of nodes in the net- 
work) number of messages per node. 

Constructing MST from LMST, following the procedure, is beneficial when the 
network considered is not very dynamic. Such scenarios include mesh networks for 
wireless Internet access, with antennas placed on the roofs of buildings. Sensor net- 
works normally are static, but the usefulness of the construction depends on the fre- 
quency of sleep period operations. We assume that sensors are divided into groups, 
and that changing between active and passive states in sensors occurs inside groups, 
while at the same time MST is constructed between groups, not between individual 
sensors. This treatment of sensor networks justifies the application of our MST con- 
struction, with certain limitations regarding accuracies involved coming from changes 
in active participating sensors from each group. In particular, reduced transmission 
range in sensor networks leads toward energy savings and prolonged network life. 
Reduced energy expenditure may also allow less frequent topological changes. 

We described also a very simple algorithm for updating MST when a new node is 
added to the network. If an existing node is deleted from the network, its neighbors 
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may similarly construct their new LMSTs, and similar procedure (with somewhat 
more details) can be applied. 

Our proposed construction of MST from LMST works only for 2D case. It cannot 
be applied in 3D since the FACE algorithm [1] does not work in 3D. It is therefore an 
open problem to design an algorithm for constructing MST from LMST in 3D. Simi- 
larly, generalizing FACE routing with guaranteed delivery to 3D [1] remains an out- 
standing open problem. 

Acknowledgments. This research is supported by CONACYT project 37017-A, CO- 
NACYT Scholarship for the first author, and NSERC grant of the second author. 
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Abstract. We consider how to maintain the topological order of a di- 
rected acyclic graph (DAG) in the presence of edge insertions and dele- 
tions. We present a new algorithm and, although this has marginally 
inferior time complexity compared with the best previously known re- 
sult, we find that its simplicity leads to better performance in practice. In 
addition, we provide an empirical comparison against three alternatives 
over a large number of random DAG’s. The results show our algorithm 
is the best for sparse graphs and, surprisingly, that an alternative with 
poor theoretical complexity performs marginally better on dense graphs. 



1 Introduction 

A topological ordering, ord, of a directed acyclic graph G = (U, E) maps each 
vertex to a priority value such that, for all edges x— y G A, it is the case that 
ord{x) < ord{y). There exist well known linear time algorithms for comput- 
ing the topological order of a DAG (see e.g. [4]). However, these solutions are 
considered offline as they compute the solution from scratch. 

In this paper we examine online algorithms, which only perform work neces- 
sary to update the solution after a graph change. We say that an online algorithm 
is fully dynamic if it supports both edge insertions and deletions. The main con- 
tributions of this paper are as follows: 

1. A new fully dynamic algorithm for maintaining the topological order of a 
directed acyclic graph. 

2. The first experimental study of algorithms for this problem. We compare 
against two online algorithms [15,1] and a simple offline solution. 

We show that, compared with [1], our algorithm has marginally inferior time 
complexity, but its simplicity leads to better overall performance in practice. 
This is mainly because our algorithm does not need the Dietz and Sleator 
ordered list structure [7]. We also find that, although [15] has the worst the- 
oretical complexity overall, it outperforms the others when the graphs are dense. 

Organisation: Section 2 covers related work; Section 3 begins with the 
presentation of our new algorithm, followed by a detailed discussion of the two 
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previous solutions [1,15]- Section 4 details our experimental work. This includes 
a comparison of the three algorithms and the standard offline topological sort; 
finally, we summarise our findings and discuss future work in Section 5. 



2 Related Work 

At this point, it is necessary to clarify some notation used throughout the re- 
mainder. Note, in the following we assume G = (V, E) is a digraph: 

Definition 1. The path relation, holds if'ix,y G V\x'^y 4=^ a;— j/G At], 
where Gt = {V,Et) is the transitive closure ofG. Ifx'^y, we say that x reaches 
y and that y is reachable from x. 

Definition 2. The set of edges involving vertices from a set, S CV , is E{S) = 
{x^y I x^y € E A {x € S V y € S')}. 

Definition 3. The extended size of a set of vertices, K CV, is denoted ||Ar|| = 
\K\ + \E{K)\. This definition originates from [1]. 

The offline topologicreferences.bibal sorting problem has been widely studied 
and optimal algorithms with 6>(||y||) (i.e. 6>(|f4| + |A|)) time are known (see e.g. 
[4]). However, the problem of maintaining a topological ordering online appears 
to have received little attention. Indeed, there are only two existing algorithms 
which, henceforth, we refer to as AHRSZ [1] and MNR [15]. We have implemented 
both and will detail their working in Section 3. For now, we wish merely to 
examine their theoretical complexity. We begin with results previously obtained: 

— AHRSZ - Achieves 0(||(5||?o(7||(5||) time complexity per edge insertion, where 
S is the minimal number of nodes that must be reprioritised [1,19]. 

— MNR - Here, an amortised time complexity of 0(|R|) over 0{\E\) insertions 
has been shown [15]. 

There is some difficulty in relating these results as they are expressed differently. 
However, they both suggest that each algorithm has something of a difference 
between best and worst cases. This, in turn, indicates that a standard worse-case 
comparison would be of limited value. Determining average-case performance 
might be better, but is a difficult undertaking. 

In an effort to find a simple way of comparing online algorithms the notion 
of bounded complexity analysis has been proposed [20,1,3,19,18]. Here, cost is 
measured in terms of a parameter 5, which captures in some way the minimal 
amount of work needed to update the solution after some incremental change. For 
example, an algorithm for the online topological order problem will update ord, 
after some edge insertion, to produce a valid ordering ord' . Here, 6 is viewed as 
the smallest set of nodes whose priority must change between ord and ord' . Under 
this system, an algorithm is described as bounded if its worse-case complexity 
can be expressed purely in terms of 5. 
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Ramalingam and Reps have also shown that any solution to the online topo- 
logical ordering problem cannot have a constant competitive ratio [19]. This sug- 
gests that competitive analysis may be unsatisfactory in comparing algorithms 
for this problem. 

In general, online algorithms for directed graphs have received scant atten- 
tion, of which the majority has focused on shortest paths and transitive closure 
(see e.g. [14,6,8,5,9,2]). For undirected graphs, there has been substantially more 
work and a survey of this area can be found in [11]. 



3 Online Topological Order 

We now examine the three algorithms for the online topological order prob- 
lem: PK, MNR and AHRSZ. The first being our contribution. Before doing this 
however, we must first present and discuss our complexity parameter 5xy 

Definition 4. Let G = (V,E) be a directed acyclic graph and ord a valid topo- 
logical order. For an edge insertion x^y, the affected region is denoted AR^y 
and defined as {k € V \ ord{y) < ord{k) < ord{x)}. 



Definition 5. Let G = (V,E) be a directed acyclic graph and ord a valid topo- 
logical order. For an edge insertion x^y, the complexity parameter 5xy is defined 
as {k € ARxy I y = ky x = ky y'^k V k'^x}. 

Notice that 5xy will be empty when x and y are already prioritised correctly 
(i.e. when ord{x) < ord{y)). We say that invalidating edge insertions are those 
which cause \5xy\ > 0. To understand how 5xy compares with <5 and the idea of 
minimal work, we must consider the minimal cover (from [1]): 

Definition 6. For a directed acyclic graph G = {V, E) and an invalidated topo- 
logical order ord, the set K of vertices is a cover if Wx,y G V.[x'^y A ord{y) < 
ord{x) X € Kyy G K\. 

This states that, for any connected x and y which are incorrectly prioritised, 
a cover K must include cc or y or both. We say that K is minimal, written 
Kmin, if it is not larger than any valid cover. Furthermore, we now show that 

k^min ^ ^xy 

Lemma 1. Let G = (F, E) be a directed acyclic graph and ord a valid topological 
order. For an edge insertion x^y, it holds that Kmin Q dxy. 

Proof. Suppose this were nreferences.bibot the case. Then a node a G Kmin, 
where a ^ Sxy must be possible. By Definition 6, a is incorrectly prioritised 
with respect to some node b. Let us assume (for now) that b'^ a and, hence, 
ord{a) < ord{b). Since ord is valid Ve G E, except x ^ y, any path from b to 
a must cross x ^ y. Therefore, y a and b x and we have a G ARxy as 
ord{y) < ord{a) < ord{b) < ord{x). A contradiction follows as, by Definition 5, 
aGdxy. The case when a'^bis similar. 
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In fact, Kmin = 5xy Only when they are both empty. Now, the complexity of 
AHRSZ is defined in terms K^in only and, thus, we know that S^y is not strictly 
a measure of minimal work for this problem. Nevertheless, we choose S^y as it 
facilitates a meaningful comparison between the algorithms being studied. 

3.1 The PK Algorithm 

We now present our algorithm for maintaining the topological order of a graph 
online. As we will see in the coming Sections, it is similar in design to MNR, 
but achieves a much tighter complexity bound on execution time. For a DAG G, 
the algorithm implements the topological ordering, ord, using an array of size 
\V\, called the node-to-index map or n2i for short. This maps each vertex to a 
unique integer in {1 . . . |R|} such that, for any edge x^y in G, n2i[x] < n2i[y]. 
Thus, when an invalidating edge insertion x ^ y is made, the algorithm must 
update n2i to preserve the topological order properreferences.bibty. The key 
insight is that we can do this by simply reorganising nodes in That is, in 
the new ordering, n2i' , nodes in Sxy are repositioned to ensure a valid ordering, 
using only positions previously held by members of Sxy All other nodes remain 
unaffected. Consider the following caused by invalidating edge x^y: 




Here, nodes are laid out in topological order (i.e. increasing in n2i value 
from left to right) with members of Sxy shown. As n2i is a total and contiguous 
ordering, the gaps must contain nodes, omitted to simplify the discussion. The 
affected region contains all nodes (including those not shown) between y and x. 
Now, let us partition the nodes of Sxy into two sets: 

Definition 7. Assume G = (V,E) is a DAG and ord a valid topologieal order. 
Let X ^ y be an invalidating edge insertion, which does not introduce a cycle. 
The sets Rp and Rb are defined as Rp = {z € ARxy \ z = y \/ y z} and 
Rb = {z G ARxy I z = x V Z'^x}. 

Note, there can be no edge from a member of Rp to any in Rb, otherwise x— >■ 
y would a introduce a cycle. Thus, for the above graph, we have Rp = {y, a, c} 
and Rb = {b,x}. Now, we can obtain a correct ordering by repositioning nodes 
to ensure all oi Rb are left oi Rp, giving: 




affected region 
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procedure add_edge(x, y) 

lb = n2i[y]; ub = n2i[x]; if lb < ub then dfs-f(j/); dfs-b(a;); reassign(); 

procedure dfs-f(n) 

mark n as visited- Rf U= {n}; 
forall n^w € E do 
if n2i[w] = ub then abort; / / cycle 
if w not visited A n2i\w\ <ub then dfs-f(w); 

procedure reassign() 
sort(J?B); sort(i?F); A = 0; 

for i = 0 to \Rb\ — 1 do ui = -Rs[i]; -Rs[i] = n2i[w]; unmark w; push(ui,L); 
for i = 0 to |_Rf| — 1 do w = = n2i[w]; unmark w, push(w,I/); 

merge(i?B,-RF,-R); 

for i = 0 to |T| — 1 do n2i[L[i]] = R[i]’, 



Fig. 1. The PK algorithm. The “sort” function sorts an array such that x comes 
before y iff n2i[x\ < n2i\y], “merge” combines two arrays into one whilst maintaining 
sortedness. “dfs-b” is similar to “dfs-f” except it traverses in the reverse direction, loads 
into Rb and compares against lb. Note, L is a temporary. 

In doing this, the original order of nodes in Rp must be preserved and likewise 
for Rb- The reason is that a subtle invariant is being maintained: 

\/x € Rf- [ n2i[x] < n2i'[x] ] A Vj/G i?B- [ n2i'[y] < n2i[y] ] 

This states that members of Rp cannot be given lower priorities than they 
already have, whilst those in Rp cannot get higher ones. This is because, for any 
node in Rp, we have identified all in the affected region which must be higher 
(i.e. right) than it. However, we have not determined all those which must come 
lower and, hence, cannot safely move them in this direction. A similar argument 
holds for Rp- Thus, we begin to see how the algorithm works: it first identifies 
Rp and Rp. Then, it pools the indices occupied by their nodes and, starting 
with the lowest, allocates increasing indices first to members of Rp and then 
Rp. So, in the above example, the algorithm proceeds by allocating b the lowest 
available index, like so: 




b X y a c 



? ? ? ? ? 



affected region 



after this, it will allocate x the next lowest index, then y and so on. The 
algorithm is presented in Figure 1 and the following summarises the two stages: 

Discovery: The set 5xy is identified using a forward depth-first search 
from y and a backward depth-first search from x. Nodes outside the affected 





388 



D.J. Pearce and P.H.J. Kelly 



region are not explored. Those visited by the forward and backward search are 
placed into Rp and Rp respectively. The total time required for this stage is 

0(ll<5.,||). 

Reassignment: The two sets are now sorted separately into increasing 
topological order (i.e. according to n2i), which we assume takes 0{\Sxy\log |<5a;y|) 
time. We then load Rp into array L followed by Rp. In addition, the pool of 
available indices, R, is constructed by merging indices used by elements of Rp 
and Rp together. Finally, we allocate by giving index R[i] to node L[i\. This 
whole procedure takes 0{\Sxy\log |i5a:y|) time. 

Therefore, algorithm PK has time complexity 0{{\Sxy\log |<5sj/|) + H^xylD- 
As we will see, this is a good improvement over MNR, but remains marginally 
inferior to that for AHRSZ and we return to consider this in Section 3.3. Finally, 
we provide the correctness proof: 

Lemma 2. Assume D = (V,E) is a DAG and n2i an array, mapping vertices 
to unique values in {l...|P|}, which is a valid topological order. If an edge 
insertion x^y does not introduce a cycle, then algorithm PK obtains a correct 
topological ordering. 

Proof. Let n2i' be the new ordering found by the algorithm. To show this is a 
correct topological order we must show, for any two vertices a, b where a ^ b, 
that n2i'[a] < n2i'[b] holds. An important fact to remember is that the algorithm 
only uses indices of those in Sxy for allocation. Therefore, zGSxy ^ n2i[y] < 
n2i'[z]<n2i[x]. There are six cases to consider: 

Case 1: a,b^ ARxy. Here neither a or b have been moved as they lie outside 
affected region. Thus, n2i[a] = n2i'[a] and n2i[b] = n2i'[b] which (by defn of n2i) 
implies n2i'[a] < n2i'[b]. 

case 2: (a G ARxyAb ^ ARxy) V (a ^ ARxy A b G ARxy). When a G ARxy 
we know n2i[a] < n2i[x] < n2i[6]. If a G 5xy then n2i'[a] < n2i[x]. Otherwise, 
n2i' [a] = n2i [a] . A similar argument holds when b G ARxy . 

Case 3: a,bG ARxy A a,b^6xy. Similar to case 1 as neither a or b have been 
moved. 

Case 4: a, bGSxy A X'^a A x^ a. Here, a reachable from x only along x^y, 
which means y'^a A y^b. Thus, a, b€ Rp and their relative order is preserved 
in n2i' by sorting. 

Case 5: a,b€Sxy A b'^y A y^b. Here, b reaches y along x^y, so b-^x 
and a'^x. Therefore, a,b G Rp and their relative order is preserved in n2i' by 
sorting. 

Case 6: X = a A y = b. Here, we have oG Rp AbG Rp and n2i'[a] < n2i'\b] 
follows because all elements oi Rp are allocated lower indices than those oi Rp. 

3.2 The MNR Algorithm 

The algorithm of Marchetti-Spaccamela et al. operates in a similar way to PK 
by using a total ordering of vertices. This time two arrays, n2i and i2n, of size 
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\V\ are used with n2i as before. The second array i2n, is the reverse mapping 
of n2i, such that i2n[n2i[x]] = x holds and its purpose is to bound the cost of 
updating n2i. The difference between PK is that only the set Rp is identified, 
using a forward depth-first search. Thus, for the example we used previously 
only y, a, c would be visited: 




To obtain a correct ordering the algorithm shifts nodes in Rp up the order 
so that they hold the highest positions within the affected region, like so: 




affected region 



Notice that these nodes always end up alongside x and that, unlike PK, each 
node in the affected region receives a new position. We can see that this has 
achieved a similar effect to PK as every node in Rp now has a lower index than 
any in Rp. For completeness, the algorithm is presented in Figure 2 and the 
two stages are summarised in the following, assuming an invalidating edge x^y: 

Discovery: A depth-first search starting from y and limited to AR^y 
marks those visited. This requires 0(||(5a;y||) time. 

Reassignment: Marked nodes are shifted up into the positions immedi- 
ately after x in i2n, with n2i being updated accordingly. This requires 
0{\ARxy\) time as each node between y and x in i2n is visited. 

Thus we obtain, for the first time, the following complexity result for al- 
gorithm MNR: 0(||52;y|| -I- \ARj;y\). This highlights an important difference in 
the expected behaviour between PK and MNR as the affected region {AR^y) 
can contain many more nodes than 5xy Thus, we would expect MNR to perform 
badly when this is so. 



3.3 The AHRSZ Algorithm 

The algorithm of Alpern et al. employs a special data structure, due to Dietz 
and Sleator [7], to implement a priority space which permits new priorities to 
be created between existing ones in 0(1) worse-case time. This is a significant 
departure from the other two algorithms. Like PK, the algorithm employs a 
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procedure add_edge(x, y) 

lb = n2i[y]; ub — n2i[x]; if lb < ub then dfs(y);shift(); 

procedure dfs(n) 
mark n as visited- 
forall s £ E do 

if n2i[s] = ub then abort; j j cycle 

if s not visited A n2i[s] <ub then dfs(s); 

procedure shift () 

L = 0; 

for i = lb to ub do 

w = i2n[i]-, 1 1 w is node at topological index i 

if w marked visited then unmark w; push(ui,L); shift = shift + l; 
else n2i[L[w]] = i — shift; i2n[i — shift] = w; 
for j = 0 to |-£/| — 1 do 

n2z[L[j]] = i — shift; i2n[i — shift] = L[j]; i = i + l; 



Fig. 2. The MNR algorithm. 



forward and backward search: We now examine each stage in detail, assuming 
an invalidating edge insertion x^y: 



Discovery: The set of nodes, K, to be reprioritised is determined by si- 
multaneously searching forward from y and backward from x. During this, 
nodes queued for visitation by the forward (backward) search are said to be 
on the forward (backward) frontier. At each step the algorithm extends the 
frontiers toward each other. The forward (backward) frontier is extending by 
visiting a member with the lowest (largest) priority. Consider the following: 



backward 

frontier 



forward _ 
frontier 



Initially, the frontiers consists of a single starting node, determined by the in- 
validating edge and the algorithm proceeds by extending each: 



backward 

frontier 



forward ^ i 

frontier 





Here, members of each frontier are marked with a dot. We see the forward 
frontier has been extended by visiting y and this results in a, e being added and 
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y removed. In the next step, a will be visited as it has the lowest priority of any 
on the frontier. Likewise, the backward frontier will be extended next time by 
visiting b as it has the largest priority. Thus, we see that the two frontiers are 
moving toward each other and the search stops either when one frontier is empty 
or they “meet” — when each node on the forward frontier has a priority greater 
than any on the backward frontier. An interesting point here is that the frontiers 
may meet before Rb and Rp have been fully identified. Thus, the discovery stage 
may identify fewer nodes than that of algorithm PK. In fact, it can be shown 
that at most 0{\Kmin\) nodes are visited [I], giving an 0{\\Kmin\\log\\Kmin\\) 
bound on discovery. The log factor arises from the use of priority queues to 
implement the frontiers, which we assume are heaps. 

The algorithm also uses another strategy to further minimise work. Consider 




where node a has high outdegree (which can be imagined as much larger). 
Thus, visiting node a is expensive as its outedges must be iterated. Instead, we 
could visit d, c, b in potentially much less time. Therefore, AHRSZ maintains a 
counter, C(n), for each node n, initialised by outdegree. Now, let x and y be 
the nodes to be chosen next on the forward and backward frontiers respectively. 
Then, the algorithm subtracts min{C {x) , C {y)) from C{x) and C{y) , extending 
the forward frontier if C{x) = 0 and the backward if C{y) = 0. 

Reassignment: The reassignment process also operates in two stages. 
The first is a depth- first search of K, those visited during discovery, and 
computes a ceiling on the new priority for each node, where: 

ceiling{x) = min{{ord{y) \y^K A x^y} U 

{ceiling{y) \ y€K A x^y} U {-|-oo}) 

In a similar fashion, the second stage of reassignment computes the floor: 

floor{y) = max{{ord' {x) \ x^y} U {— oo}) 

Note that, ord'{x) is the topological ordering being generated. Once the floor 
has been computed the algorithm assigns a new priority, ord'{k), such that 
floor{k) < ord'{k) < ceiling{k). An 0{\Kmin\log\Kmin\) + \E{Kmin)\) bound 
on the time for reassignment is obtained. Again, the log factor arises from the 
use of a priority queue. The bound is slightly better than for discovery as only 
nodes in K are placed onto this queue. 

Therefore, we arrive at a 0{\\Krnin\\log\\Krnin\\) time bound on AHRSZ 
[1,19]. Finally, there has been a minor improvement on the storage requirements 
of AHRSZ [21], although this does not affect our discussion. 
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procedure add_edges(-B) // B is a batch of updates 

if3x^y £ B.[ord{y) < ord{x)] then perform standard topological sort 



Fig. 3. Algorithm DFS. Note that ord is implemented as an array of size \V\. 



3.4 Comparing PK and AHRSZ 

We can now see the difference between PK and AHRSZ is that the latter has 
a tighter complexity bound. However, there are some intriguing differences be- 
tween them which may offset this. In particular, AHRSZ relies on the Dietz and 
Sleator ordered list structure [7] and this does not come for free: firstly, it is 
difficult to implement and suffers high overheads in practice (both in time and 
space); secondly, only a certain number of priorities can be created for a given 
word size, thus limiting the maximum number of nodes. For example, only 32768 
priorities (hence nodes) can be created if 32bit integers are being used, although 
with 64bit integers the limit is a more useful 2^^ nodes. 

4 Experimental Study 

To experimentally compare the three algorithms, we measured their performance 
over a large number of randomly generated DAGs. Specifically, we investigated 
how insertion cost varies with \V\, \E\ and batch size. The latter relates to 
the processing of multiple edges and, although none of the algorithms discussed 
offer an advantage from this, the standard offline topological sort does. Thus, 
it is interesting to consider when it becomes economical to use and we have 
implemented a simple algorithm for this purpose, shown in Figure 3. 

To generate a random DAG, we select uniformly from the probability space 
Gdag{n,p), a variation on G(n,p) [12], first suggested in [10]: 

Definition 8. The model Gdag(ji,p) is a probability space containing all graphs 
having a vertex set V = {1,2, . . . ,n} and an edge set E C |(f,j) | i < j}. Each 
edge of such a graph exists with a probability p independently of the others. 

For a DAG in Gdag{n,p), we know that there are at most possible edges. 

Thus, we can select uniformly from Gdag(ji,p) by enumerating each possible 
edge and inserting with probability p. In our experiments, we used p = Gff- to 
generate a DAG with n nodes and expected average outdegree x. 

Our procedure for generating a data point was to construct a random DAG 
and measure the time taken to insert 5000 edges. We present the exact method 
in Figure 4 and, for each data point, this was repeated 25 times with the average 
taken. Note that, we wanted the number of insertions measured over to increase 
proportionally with |y|, but this was too expensive in practice. Also, for the 
batch experiments, we always measured over a multiple of the batch size and 
chose the least value over 5000. We also recorded the values of our complex- 
ity parameters \5xy\, ||<5a:y|| and \ARxy\, in an effort to correlate our theoretical 
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procedure measure_acpi(y, E, B, O) 

II measure, in B sized batches, O insertions over a DAG with V 
/ / nodes and E edges and we assume O = cB, for some c. 
edgeS = gen_random_acyclic_edgeset(E, E + O); 
overS = randomly select O edges from edgeS; 

G = ({1 . . . E}, edgeS — overS); 

startT = timestampO; // start timing now 

while overS yf 0 

T = randomly select B edges from overS; 

overS = overS — T; 

add_edges(T, G); 

randomly erase B edges from G; 

return (timestamp() — startT) I O; 



Fig. 4. Our procedure for measuring insertion cost over a random DAG. The algorithm 
maintains a constant number of edges in G in an effort to eliminate interference caused 
by varying V , whilst keeping O fixed. Note that, through careful implementation, we 
have minimised the cost of the other operations in the loop, which might have otherwise 
interfered. In particular, erasing edges is fast (unlike adding them) and independent of 
the algorithm being investigated. 
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Fig. 5. Experimental data on random graphs with varying \ V\. 
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Fig. 7. Experimental data for varying batch sizes comparing the three algorithms 
against a DFS based offline topological sort 
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analysis. This was done using the same procedure as before, but instead of mea- 
suring time, we traversed the graph on each insertion to determine their values. 
These were averaged over the total number of edges inserted for 25 runs of the 
procedure from Figure 4. 

Non-invalidating edges were included in all measurements and this dilutes the 
execution time and parameter counts, since all three algorithms do no work for 
these cases. Our purpose, however, was to determine what performance can be 
expected in practice, where it is unlikely all edge insertions will be invalidating. 

The data, presented in Figures 5, 6 and 7, was generated on a 900Mhz Athlon 
based machine with 1GB of main memory. Note, we have used some (clearly 
marked) scaling factors to help bring out features of the data. The implemen- 
tation itself was in C-|— I- and took the form of an extension to the Boost Graph 
Library. The executables were compiled using gcc 3.2, with optimisation level 
“-02” and timing was performed using the gettimeof day function. Our imple- 
mentation of AHRSZ employs the 0(1) amortised (not 0(1) worse-case) time 
structure of Dietz and Sleator [7]. This seems reasonable as they themselves state 
it likely to be more efficient in practice. 

4.1 Discussion 

The clearest observation from Figures 5 and 6 is that PK and AHRSZ have 
similar behaviour, while MNR is quite different. This seems surprising as we 
expected the theoretical differences between PK and AHRSZ to be measured. 
One plausible explanation is that the uniform nature of our random graphs 
makes the work saved by AHRSZ (over PK) reasonably constant. Thus, it is 
outweighed by the gains from simplicity and efficiency offered by PK. 

Figure 5: These graphs confirm our theoretical evaluation of MNR, whose 
observed behaviour strongly relates to that of \ARxy\. Furthermore, we expected 
the average size of AR^y to increase linearly with \V\ as \AR^y\ < \V\. Likewise, 
the graphs for PK and AHRSZ correspond with those of ||i5a;y||. The curve for 
||5a;y|| is perhaps the most interesting here. With outdegree 1, it appears to 
level off and we suspect this would be true at outdegree 10, if larger values of 
\V\ were shown. We know the graphs become sparser when \V\ gets larger as, 
by maintaining constant outdegree, \E\ is increasing linearly (not quadratically) 
with \V\. This means, for a fixed sized affected region, \5xy\ goes down as \V\ 
goes up. However, the average size of the affected region is also going up and, 
we believe, these two factors cancel each other out after a certain point. 

Figure 6: From these graphs, we see that MNR is worst (best) overall 
for sparse (dense) graphs. Furthermore, the graphs for MNR are interesting 
as they level off where \ARxy\ does not appear to. This is particularly evident 
from the log plot, where \ARxy\ is always decreasing. This makes sense as 
the complexity of MNR is dependent on both \AR^y\ and ||<5a;y||. So, for 
low outdegree MNR is dominated by \ARxy\, but soon ||5a;y|| becomes more 
significant at which point the behaviour of MNR follows this instead. This is 
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demonstrated most clearly in the graph with high outdegree. Note, when its 
behaviour matches PK, MNR is always a constant factor faster as it performs 
one depth-first search and not two. Moving on to \AR^y\, if we consider that 
the probability of a path existing between any two nodes must increase with 
outdegree, then the chance of inserting an invalidating edge must decrease 
accordingly. Furthermore, as each non-invalidating edge corresponds to a 
zero value of \ARxy\ in our average, we can see why \AR^y\ goes down with 
outdegree. Another interesting feature of the data is that we observe both a 
positive and negative gradient for \6xy\- Again, this is highlighted in the log 
graph, although it can be observed in the other. Certainly, we expect \5xy\ to 
increase with outdegree, as the average size of the subgraph reachable from y 
(the head of an invalidating edge) must get larger. Again, this is because the 
probability of two nodes being connected by some path increases. However, 
\5xy \ is also governed by the size of the affected region. Thus, as \ARxy \ has a 
negative gradient we must eventually expect 5xy to do so as well. Certainly, 
when \ARxy\ ~ \Sxy\, this must be the case. In fact, the data suggests the 
downturn happens some way before this. Note that, although \5xy\ decreases, 
the increasing outdegree appears to counterbalance this, as we observe that 
||5a;j,|| does not exhibit a negative gradient. In general, we would have liked 
to examine even higher outdegrees, but the time required for this has been a 
prohibitive factor. 

Figure 7: These graphs compare the simple algorithm from Figure 3, 
with the offline topological sort implemented using depth- first search, to those 
we are studying. They show a significant advantage is to be gained from using 
the online algorithms when the batch size is small. Indeed, the data suggests 
that they compare favourably even for sizeable batch sizes. It is important 
to realise here that, as the online algorithms can only process one edge at a 
time, their graphs are flat since they obtain no advantage from seeing the edge 
insertions in batches. 

5 Conclusion 

We have presented a new algorithm for maintaining the topological order of a 
graph online, provided a complexity analysis, correctness proof and shown it 
performs better, for sparse graphs, than any previously known. Furthermore, we 
have provided the first empirical comparison of algorithms for this problem over 
a large number of randomly generated acyclic digraphs. 

For the future, we would like to investigate performance over different classes 
of random graphs (e.g. using the locality factor from [10]) . We are also aware that 
random graphs may not reflect real life structures and, thus, experimentation on 
physically occurring structures would be useful. Another area of interest is the 
related problem of dynamically identifying strongly connected components and 
we have shown elsewhere how MNR can be modified for this purpose [17]. We 
refer the reader to [16], where a more thorough examination of the work in this 
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paper and a number of related issues can be found. Finally, Irit Katriel has since 
shown that algorithm PK is worse-case optimal, with respect to the number of 
nodes reordered over a series of edge insertions [13]. 
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Abstract. We consider two coloring problems: interval coloring and 
max-coloring for chordal graphs. Given a graph G = (U, E) and positive 
integral vertex weights w : U — >■ N, the interval coloring problem seeks to 
find an assignment of a real interval I(u) to each vertex u a V such that 
two constraints are satisfied: (i) for every vertex u £ V, |/(m)1 = w{u) and 
(ii) for every pair of adjacent vertices u and v, I{u)nl{v) = 0. The goal is 
to minimize the span |U„gv.f(n)|- The max- coloring problem seeks to find 
a proper vertex coloring of G whose color classes Ci, C 2 , . . . , Cfc, mini- 
mize the sum of the weights of the heaviest vertices in the color classes, 
that is, maXy^CiW(v). Both problems arise in efficient memory 

allocation for programs. Both problems are NP-complete even for inter- 
val graphs, though both admit constant-factor approximation algorithms 
on interval graphs. In this paper we consider these problems for chordal 
graphs. There are no known constant-factor approximation algorithms for 
either interval coloring or for max-coloring on chordal graphs. However, 
we point out in this paper that there are several simple 0(log(n))-factor 
approximation algorithms for both problems. We experimentally evalu- 
ate and compare three simple heuristics: first-fit, best-fit, and a heuristic 
based on partitioning the graph into vertex sets of similar weight. Our 
experiments show that in general first-fit performs better than the other 
two heuristics and is typically very close to OPT, deviating from OPT 
by about 6% in the worst case for both problems. Best-fit provides some 
competition to first-fit, but the graph partitioning heuristic performs 
significantly worse than either. Our basic data comes from about 10000 
runs of the each of the three heuristics for each of the two problems on 
randomly generated chordal graphs of various sizes and sparsity. 



1 Introduction 

Interval coloring. Given a graph G = {V, E) and positive integral vertex weights 
m : U — >■ N, the interval coloring problem seeks to find an assignment of an 
interval I{u) to each vertex u G V such that two constraints are satisfied: (i) 
for every vertex u gV, |/(u)| = w{u) and (ii) for every pair of adjacent vertices 
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u and V, I{u) (1 I (v) = 0. The goal is to minimize the span | U„ The 

interval coloring problem has a fairly long history dating back, at least to the 
70’s. For example, Stockmeyer showed in 1976 that the interval coloring problem 
is NP-complete even when restricted to interval graphs and vertex weights in 
{1,2} (see problem SR2 in Garey and Johnson [1]). The main application of 
the interval coloring problem is in the compile-time memory allocation problem. 
Fabri [2] made this connection in 1979. In order to reduce the total memory 
consumption of source-code objects (simple variables, arrays, structures), the 
compiler can make use of the fact that the memory regions of two objects are 
allowed to overlap provided that the objects do not “interfere” at run-time. 
This problem can be abstracted as the interval coloring problem, as follows. The 
source-code objects correspond to vertices of our graph, run-time interference 
between pairs of source code objects is represented by edges of the graph, the 
amount of memory needed for each source-code object is represented by the 
weight of the corresponding vertex, and the assignment of memory regions to 
source code objects is represented by the assignment of intervals to vertices of the 
graph. Minimizing the size of the union of intervals corresponds to minimizing 
the amount of memory allocation. 

If we restrict our attention to straight-line programs, that is, programs with- 
out loops or conditional statements, then the compile-time memory allocation 
problem can be modeled as the interval coloring problem for interval graphs. 
Since the interval coloring problem is NP-complete for interval graphs research 
has focused approximation algorithms. The current best approximation factor is 
due to Buchsbaum et.al. [3], who give a (2 -|- e)-algorithm for the problem. 

Max-coloring. Like interval coloring, the max-coloring problem takes as input 
a vertex- weighted graph G = (V,E) with weight function w : P — >■ N. The 
problem requires that we find a proper vertex coloring of G whose color classes 
Cl, C 2 , . . . , Cfc, minimize the sum of the weights of the heaviest vertices in the 
color classes, that is, maXy^CiWiy). The max-coloring problem models the 

problem of minimizing the total buffer size needed for memory management in 
different applications. For example, [4] uses max-coloring to minimize buffer size 
in digital signal processing applications. In [5], max-coloring models the problem 
of minimizing buffer size needed by memory managers for wireless protocol stacks 
like GPRS or 3G. In general, programs that run with stringent memory or timing 
constraints use a dedicated memory manager that provides better performance 
than the general purpose memory management of the operating system. The 
most commonly used memory manager design for this purpose is the segregated 
bujfer pool. This consists of a fixed set of buffers of various sizes with buffers of 
the same size linked together in a linked list. As each memory request arrives, it 
is satisfied by a buffer whose size is at least as large as the size of the memory 
request. The assignment of buffers to memory requests can be viewed as an 
assignment of colors to the requests - all requests that are assigned a buffer 
are colored identically. Requests that do not interfere with each other can be 
assigned the same color/buffer. Thus the problem of minimizing the total size 
of the buffer pool corresponds to the max-coloring problem. 
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[5] shows that the max-coloring problem is NP-complete for interval graphs 
and presents the first constant-factor approximation algorithm for the max- 
coloring problem for interval graphs. The paper makes a connection between 
max-coloring and on-line graph coloring and using a known result of Kierstead 
and Trotter [6] on on-line coloring interval graphs, they obtain a 2-approximation 
algorithm for interval graphs and a 3-approximation algorithm for circular arc 
graphs. 

Connections between interval coloring and max- coloring. Given a coloring of 
a vertex weighted graph G = (V,E) with color classes Ci, C 2 , . . . , Cfc, we can 
construct an assignment of intervals to vertices as follows. For each i, 1 < i < k, 
let Vi G Ci be the vertex with maximum weight in (7^. Let H{1) = 0, and for 
each i, 2 < i < k, let H{i) = For each vertex v G Ci, we set 

I{v) = (H{i), H{i) -\-w{v)). Clearly, no two vertices in distinct color classes have 
overlapping intervals and therefore this is a valid interval coloring of G. We say 
that this is the interval coloring induced by the coloring Ci, C 2 , . . . , C^. The 
span of this interval coloring is ’w{vi), which is the same as the weight of 

the coloring Ci,C 2 , ■ ■ ■ ,Ck viewed as a max-coloring. In other words, if there 
is a max-coloring of weight W for a vertex weighted graph G, then there is an 
interval coloring of G of the same weight. 

However, in [5] we show an instance of a vertex weighted interval graph on 
n vertices for which the weight of an optimal max-coloring is l7(logn) times the 
weight of the heaviest clique. This translates into an l7(logn) gap between the 
weight of an optimal max-coloring and the span of an optimal interval coloring 
because an optimal interval coloring of an interval graph has span that is within 
0(1) of the weight of a heaviest clique [3]. 

In general, algorithms for max-coloring can be used for interval coloring with 
minor modifications to make the interval assignment more “compact.” These 
connections motivate us to study interval coloring and max-coloring in the same 
framework. 

Chordal graphs. For both the interval coloring and max-coloring problems, the 
assumption that the underlying graph is an interval graph is somewhat restrictive 
since most programs contain conditional statements and loops. In this paper 
we consider a natural generalization of interval graphs called chordal graphs. 
A graph is a chordal graph if it has no induced cycles of length 4 or more. 
Alternately, every cycle of length 4 or more in a chordal graph has a chord. 

The approximability of interval coloring and max-coloring on chordal graphs 
is not very well understood yet. As we point out in this paper, there are sev- 
eral 0(log(n))-factor approximation algorithms for both problems on chordal 
graphs, however the existence of constant-factor approximation algorithms for 
these problems is open. 

There are many alternate characterizations of chordal graphs. One that will 
be useful in this paper is the existence of a perfect elimination ordering of the 
vertices of any chordal graph. An ordering Vn,Vn-i, . ■ . ,vi of the vertex set of 
a graph is said to be a perfect elimination ordering if when vertices are deleted 
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in this order, for each z, the neighbors of vertex Vi in the remaining graph, 
G[{vi,V 2 , ■ • ■ ,fi}] form a clique. A graph is a chordal graph iff it has a perfect 
elimination ordering. Tarjan and Yannakakis [7] describe a simple linear-time 
algorithm called maximum cardinality search that can be used to determine if a 
given graph has a perfect elimination ordering and to construct such an ordering 
if it exists. Given a perfect elimination ordering of a graph G, the graph can be 
colored by considering vertices in reverse perfect elimination order and assigning 
to each vertex the minimum available color. It is easy to see that this greedy 
coloring algorithm uses exactly as many colors as the size of the largest clique 
in the graph and therefore produces an optimal vertex coloring. 

Every interval graph is also a chordal graph (but not vice versa). To see this, 
take an interval representation of an interval graph and order the intervals in 
left-to-right order of their left endpoints. It is easy to verify that this gives a 
perfect elimination ordering of the interval graph. 

The rest of the paper. In this paper, we consider three simple heuristics for the 
interval coloring and max-coloring problems and experimentally evaluate their 
performance. These heuristics are: 

— First fit. Vertices are considered in decreasing order of weight and each 
vertex is assigned the first available color or interval. 

— Best fit. Vertices are considered in reverse perfect elimination order and 
each vertex is assigned the color class or interval it “fits” in best. 

— Graph partitioning. Vertices are partitioned into groups with similar 
weight and we use the greedy coloring algorithm to color each subgraph with 
optimal number of colors. The interval assignment induced by this coloring 
is returned as the solution to the interval coloring problem. 

First fit and best-fit are fairly standard heuristics for many resource allocation 
problems and have been analyzed extensively for problems such as the bin pack- 
ing problem. Using old results and a few new observations, we point out that 
the first fit heuristic and the graph partitioning heuristic provide an O(logn) 
approximation guarantee. The best-fit heuristic provides no such guarantee and 
it is not hard to construct an example of a vertex weighted interval graph for 
which the best-fit heuristic returns a solution to the max-coloring problem whose 
weight is G{y/n) times the weight of the optimal solution. 

Our experiments show that in general first-fit performs better than the other 
two heuristics and is typically very close to OPT, deviating from OPT by about 
6% in the worst case for both problems. Best-fit provides some competition to 
first-fit, but the graph partitioning heuristic performs significantly worse than 
either. Our basic data comes from about 10000 runs of the each of the three 
heuristics for each of the two problems on randomly generated chordal graphs 
of various sizes and sparsity. 

Our experiments also reveal that best-fit performs better on chordal graphs 
that are “irregular”, while the performance of first-fit deteriorates slightly. In 
all other cases, first-fit is the best algorithm. Here, “regularity” refers to the 
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variance in the sizes of maximal cliques - greater this variance, more irregular 
the graph. 



2 The Algorithms 

In this section we describe three simple algorithms for the interval coloring and 
max-coloring problems. 

2.1 Algorithm 1: First-Fit in Weight Order 

For the interval coloring problem, we preprocess the vertices and “round up” 
their weights to the nearest power of 2. Then, for both problems we order the 
vertices of the graph in non-increasing order of weights. Let vi,V 2 , ■ ■ ■ ,Vnhe this 
ordering. We process vertices in this order and use a “first-fit heuristic” to assign 
intervals and colors to vertices to solve the interval coloring and max-coloring 
problem respectively. 

The algorithm for interval coloring is as follows. To each vertex we assign 
a real interval with non-negative endpoints. To vertex vi, we assign (0,w(ui)). 
When we get to vertex Vi, i > 1, each vertex Vj, I < j < i— I has been assigned an 
interval I{vj). Let Ui be the union of the intervals already assigned to neighbors 
of Vi- Then (0,oo) — Ui is a non-empty collection of disjoint intervals. Because 
the weights are powers of 2 and vertices are considered in non-increasing order 
of weights, every interval in (0,oo) — Ui has length at least w{vi). Of these, 
pick an interval / = (a, 6) with smallest right endpoint and assign the interval 
(a , a + w{vi)) to Vi- This is I{vi). 

For a solution to the max-coloring problem, we assume that the colors to be 
assigned to vertices are natural numbers, and assign to each vertex Vi the smallest 
color not already assigned to a neighbor of u*. We denote the two algorithms 
described above by FFI (short for first-fit by weight order for interval coloring) 
and FFM (short for first-fit by weight order for max-coloring) respectively. 

We now observe that both algorithms provide an 0(log(n))-approximation 
guarantee. The following result is a generalization of the result from [8]. 

Theorem 1. Let C be a class of graphs and suppose there is a function a{n) 
such that the first-fit on-line graph coloring algorithm colors any n-vertex graph 
G in C with at most a{n) ■ x(G) colors. Then, for any n-vertex graph G in C 
the FFI algorithm produces a solution with span at most 2a{n) ■ OPTj{G), where 
OPTj{G) is the optimal span of any feasible assignment of intervals to vertices. 

The following is a generalization of the result from [5] . 

Theorem 2. Let G be a class of graphs and suppose there is a function a{n) 
such that the first-fit on-line graph coloring algorithm colors any n-vertex graph 
G in C with at most a{n) -x(G) colors. Then, for any n-vertex graph G in G the 
FFM algorithm produces a solution with weight at most a{n) ■ OPTm{G), where 
OPTm{G) is the optimal weight of any proper of vertex coloring ofG. 




404 S.V. Pemmaraju, S. Penumatcha, and R. Raman 



Irani [9] has shown that the first-fit graph coloring algorithm uses at most 
0(log(n)) •x(G) colors for any n- vertex chordal graph G. This fact together with 
the above theorems implies that FFI and FFM provide 0(log(n))-approximation 
guarantees. 

An example that is tight for both algorithms is easy to construct. Let 
To, Ti, T2 , . . . be a sequence of trees where Tq is a single vertex and Tj, i > 0, is 
constructed from Ti_i as follows. Let V{Ti-i) = To construct 

Ti, start with Ti_i and add vertices {vi,V 2 , ■ ■ ■ ,Vk} and edges {ui,Vi} for all 
i = 1, 2, . . . , /c. Thus the leaves of Ti are {ui, U2 , . . . , Vk} and every other vertex 
in Ti has a neighbor Vj for some j. Now consider a tree T„ in this sequence. 
Clearly, |R(T„)| = 2”. Assign to each vertex in T„ a unit weight. To construct 
an ordering on the vertices of T„ first delete the leaves of T„. This leaves the 
tree T„_i. Recursively construct the ordering on vertices of T„_i, and prepend 
to this the leaves of T„ in some order. It is easy to see that first-fit coloring al- 
gorithm that considers the vertices of T„ in this order uses n colors. As a result, 
both FFI and FFM have cost n, whereas OPT in both cases is 2. See Figure 1 for 
To, Ti, T2, and T3. 



T 

0 



Fig. 1. 



o 



-o 



-o 






The family of tight examples for FFI and FFM. 



2.2 Algorithm 2: Best-Fit in Reverse Perfect Elimination Order 

A second pair of algorithms that we experiment with are obtained by considering 
vertices in reverse perfect elimination order and using a “best-fit” heuristic to 
assign intervals or colors. Let vi,V 2 , ■ ■ ■ ,Vn be the reverse of a perfect elimination 
ordering of the vertices of G. Recall that if vertices are considered in reverse 
perfect elimination order and colored, using the smallest color at each step, we 
get an optimal coloring of the given chordal graph. This essentially implies that 
the example of a tree with unit weights that forced FFI and FFM into worst case 
behavior will not be an obstacle for this pair of algorithms. 

The algorithm for interval coloring is as follows. As before, to each vertex we 
assign a real interval with non-negative endpoints and to vertex v\, we assign 
(0,w(ui)). When we get to vertex Vi, i > 1, each vertex Vj, 1 < j < i — 1 has 
been assigned an interval I{vj). Let M = | /(uj)| and let Ui be the union of 
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the intervals I{vj), where 1 < j < i — 1 and Vj is a neighbor of Vi. If Ui = (0, M), 
then Vi is assigned the interval {M,M + w{vi)). Otherwise, if Ui ^ (0,M), then 
(0,M) — Ui is a non-empty collection of disjoint intervals. However, since the 
vertices were not processed in weight order, we are no longer guaranteed that 
there is any interval in (0, M) — Ui with length at least w{vi). There are two 
cases. 

Case 1. If there is an interval in {0, M) — Ui of length at least w{vi), then pick 
an interval I G (0, M) — Ui of smallest length such that |J| > w{vi). Suppose 
/ = (a, b). Then assign the interval (a, a + w(vi))to Vi. 

Case 2. Otherwise, if all intervals in (0, M) — Ui have length less than w{vi), 
pick the largest interval I = (a, b) in (0, M) — Ui (breaking ties arbitrarily) 
and assign (a, a + w{vi)) to Vi. Note that this assignment of an interval to Vi 
causes the interval assignment to become infeasible. This is because there is 
some neighbor of Vi that has been assigned an interval with left endpoint b 
and (a, a -I- w{vi)) intersects this interval. To restore feasibility, we increase 
the endpoints of all intervals “above” 6 by Z\ = (ad- w{vi)) — b. In other 
words, for every vertex Vj, 1 < j < i, if I{vj) = (c, d), where c> b then I{vj) 
is reset to the interval {c+ A, d + A). 

Consider the chordal graph shown in Figure 2. The numbers next to vertices 
are vertex weights and the letters are vertex labels. The ordering of vertices 
A^B,C,D,E is a reverse perfect elimination ordering. By the time we get to 
processing vertex E, the assignment of intervals to vertices is as shown in the 
middle in Figure 2. When E is processed, we look for “space” to fit it in and find 
the interval (10, 15), which is not large enough for E. So we move the interval 
I{D) up by 5 units to make space for I{E) and obtain the assignment shown on 
the right. 

A similar “best-fit” solution to the max-coloring problem is obtained as fol- 
lows. Let k be the size of a maximum clique in G. Start with a pallete of colors 
C = {1, 2, . . . , A:} and an assignment of color 1 to vertex vi. Let AC{vi) C C be 
the colors available for Vi. For each color j, let Wj denote the maximum weight 




Fig. 2. The best-fit heuristic in action for interval coloring. 
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among all vertices colored j; for an empty color class j, Wj = 0. From the subset 
of AC{vi) of available colors whose weights are atleast as large as w{vi), pick the 
color class j for Vi, whose weight is the smallest. If no such color exists, color 
vertex Vi with a color j G AC(vi) for which Wj is maximum, with ties broken 
arbitrarily. This ensures that the color we assign to Vi minimizes the increase in 
the weight of the coloring. 

We will call these “best-fit” algorithms for interval coloring and max-coloring, 
BFI and BFM respectively. It is not hard to construct an example of a vertex 
weighted interval graph for which the BFM returns a solution whose weight is 
f2(i/n) times OPT. This example does not appear in this paper due to lack of 
space. 



2.3 Algorithm 3: Via Graph Partitioning 

Another pair of algorithms for interval coloring and max-coloring can be obtained 
by partitioning the vertices of the given graph into groups with similar weight. 
Let W be the maximum vertex weight. Fix an integer 1 < A: < log 2 W and 
partition the range [1, W] into {k + 1) subranges: 



[l 


/IF IF 1 


/IF IF- 


/IF 1 

(y'H 


L ’ 2*J 


’ V2* ’ 2*-F 


’■■■’ V^’ Y. 



For i, 1 < i < k, let = (IF/2*, and let Rk+i = [1, IF/2*]. Partition the 

vertex set V into subsets Vi,l < i < (fc-Fl) defined asVi = {v & V \ w{v) G Ri}. 
For each i, 1 < i < (A: -F 1) let be the induced subgraph G[Fj- We ignore 
the weights and color each subgraph Gi with the fewest number of colors, using 
a fresh pallette of colors for each subgraph Gi. For the max-coloring problem, 
we simply use this coloring as the solution. The solution to the interval coloring 
problem is simply the interval assignment induced by the coloring. 

We will call these graph partitioning based algorithms for interval coloring 
and max-coloring, GPI and GPM respectively. 

Theorem 3. If we set k = 21og(n), then GPI and GPM produee (4-log(n)-|-o(l))- 
approximations to both the interval eoloring as well as the max-eoloring problems. 

Proof. For i, I < i < k, let at be the weight of the heaviest clique in G[Fj. 
Let Xi = x(G[F]). Clearly, ai > \i ‘ Wf2^. Let OPT refer to the weight of an 
optimal max-coloring and let OPT^ refer to the weight of an optimal max-coloring 
restricted to vertices in Vi. Note that OPF > Since GPM colors each Vi with 
exactly Xt colors and since the weight of each vertex in Vi is at most IF/2*“^, 
the weight of the coloring that GPM assigns to F is at most Xi ’ W < 2 -ai < 
2 • OPTi. Since GPM uses a fresh pallette of colors for each F, the weight of the 
coloring of U*^;^F is at most 

k k 

2 • ^ DPT, < 2 • ^ DPT = 41og(n) • DPT. 

Z =1 2=1 
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Since k = 21og(n), W/2’^ = Wir?. Therefore, any coloring of V^+i adds a 
weight of atmost W jn to the coloring of the rest of the graph. Since W < OPT, 
GPM colors the entire graph with weight atmost (4 • log(n) + l/n)0PT. 

The lower bound on ctj that was used in the above proof for max-coloring 
also applies to interval coloring and we get the same approximation factor for 
interval coloring. 



3 Overview of the Experiments 

3.1 How Chordal Graphs Are Generated 

We have implemented an algorithm that takes in parameters n (a positive inte- 
ger) and a (a real number in [0, 1]) and generates a random chordal graph with 
n vertices, whose sparsity is characterized by a. The smaller the value of a the 
more sparse the graph. In addition, the algorithm can run in two modes; in mode 
1 it generates somewhat “regular” chordal graphs and in mode 2 it generates 
somewhat “irregular” chordal graphs. 

The algorithm generates chordal graphs with n, (n — 1 ),..., 2, 1 as a perfect 
elimination ordering. In the ith iteration of the algorithm vertex i is connected to 
some subset of the vertices in {1, 2, . . . , i — 1}. Let Gi-\ be the graph containing 
vertices 1, 2, . . . , (t — 1), generated after iteration {i — 1). Let {C\,C2, ■ ■ . , Ct) 
be the set of maximal cliques in Gj_i. It is well known that any chordal graph 
on n vertices has at most n maximal cliques. So we explicitly maintain the list 
of maximal cliques in Gi-\. We pick a maximal clique Gj and a random subset 
S C Cj and connect i to the vertices in S. This ensures that the neighbors of 
i in {1, 2, . . . , t — 1} form a clique, thereby ensuring that n, (n — 1), . . . , 2, 1 is a 
perfect elimination ordering. 

We use the parameter a in order to pick the random subset S. For each 
V € Gj, we independently add v to set S with probability a. This makes the 
expected size of S equal a ■ \Gj\. The algorithm also has a choice to make on 
how to pick Gj. One approach is to choose Gj uniformly at random from the 
set {G\,C2, ■ ■ ■ ,Gt). This is mode 1 and it leads to “regular” random chordal 
graphs, that is, random chordal graphs in which the sizes of maximal cliques show 
small variance. Another aproach is to choose a maximal clique with largest size 
from among {Ci,G2, ■ ■ ■ ,Gt}. This is mode 2 and it leads to more “irregular” 
random chordal graphs, that is, random chordal graphs in which there are a 
small number of very large maximal cliques and many small maximal cliques. 
Graphs generated in the two modes seem to be structurally quite different. This 
is illustrated in Table 1, where we show information associated with 10 instances 
of graphs with n = 250 and a = 0.9 generated in mode 1 and in mode 2. Each 
column corresponds to one of the 10 instances and comparing corresponding 
mode 1 and mode 2 rows easily reveals the the fairly dramatic difference in 
these graphs. For example, the mean clique size in mode 1 is about 8.5, while it 
is about 22 in mode 2. Even more dramatic is the large difference in the variance 
of the clique sizes and this justifies our earlier observation that mode 2 chordal 
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graphs tend to have a few large cliques and many small cliques, relative to mode 
1 chordal graphs. 



Table 1. Properties of 20 instances of graphs with n = 250 and a = 0.9. Ten of these 
were generated in Mode 1 and the other ten in mode 2. 



MODE 1 

No. of maximal cliques 149 l26 UO l26 147 Il6 IT9 IT9 128 149 

Size of largest clique 13 14 12 12 14 11 12 12 12 14 

Size of smallest clique 4353544435 
Mean clique size 8.58 7.35 8.35 7.83 9.41 7.51 7.10 7.31 7.98 9.53 

Variance 4.06 3.83 3.23 2.35 3.32 1.99 2.36 2.82 3.61 3.23 

MODE 2 

No. of maximal cliques 220 2l6 2l6 2TS 2TS 2T9 2TS 2T6 2T9 219 

Size of largest clique 29 33 33 31 14 31 30 36 30 30 

Size of smallest clique 5357457544 
Mean clique size 20.13 22.34 22.37 24.00 21.15 21.83 25.17 23.70 22.80 20.89 

Variance 28.43 30.02 30.69 29.55 33.98 31.73 48.72 32.66 25.81 29.68 



3.2 How Weights Are Assigned 

Once we have generated a chordal graph G we assign weights to the vertices 
as follows. This process is paramaterized by W , the maximum possible weight 
of a vertex. Let k be the chromatic number of G and let {Gi,G 2 , ■ ■ ■ ,Ck} be 
a /c-coloring of G. Since G is a chordal graph, it contains a clique of size k. 
Let Q = {vi,V 2 , . . . ,Vk} be a clique in G with Vi G G^. For each Vi, pick w{vi) 
uniformly at random from the set of integers {1, 2, . . . ,W}. Thus the weight of 
Q is tc(ui). For each vertex v G Gi — {vi}, pick w{v) uniformly at random 

from {1, 2, ... , ■u;(r;i)}. This ensures that {Gi, G 2 , . . . , Gk} is a solution to max- 
coloring with weight and the interval assignment induced by this 

coloring is an interval coloring of span Since is also 

the weight of the clique Q, which is a lower bound on OPT in both cases, we 
have that OPT = wivi) in both cases. The advantage of this method of 
assigning weights is that it is simple and gives us the value of OPT for both 
problems. The disadvantage is that, in general OPT for both problems can be 
strictly larger than the weight of the heaviest clique and thus by generating only 
those instances for which OPT equals the weight of the heaviest clique, we might 
be missing a rich class of problem instances. 

We also tested our algorithms on instances of chordal graphs for which the 
weights were assigned uniformly at random. For these algorithms, we use the 
maximum weighted clique as a lower bound for OPT. 
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Table 2. Results of our main experiments on mode 1 random chordal graphs, evalu- 
ating heuristics for the max-coloring problem. 



|F| \E\ 


OPT 


Best Fit First Fit Partition 


(BF-OPT) 

OPT 


(FF-OPT) 

OPT 


(GPM-OPT) 

OPT 


40 


64.4333 


2292.16 


2416.64 


2309.09 


3184.84 


5.43108 


0.738751 


38.9454 


80 


139.178 


2474.74 


2604.09 


2525.5 


3569.16 


5.22658 


2.05094 


44.2232 


120 


223.756 


2605.77 


2827.46 


2660.5 


3956.09 


8.50763 


2.10047 


51.8205 


160 


304.822 


2682.98 


2941.02 


2733.71 


4095.01 


9.61784 


1.89093 


52.6293 


200 


394.556 


2712.4 


2938.49 


2761.17 


4107.06 


8.33538 


1.79792 


51.4178 


240 


451.978 


2835.52 


3161.7 


2882.82 


4394.1 


11.5033 


1.66812 


54.9662 


280 


525.389 


2863.43 


3145.16 


2928.19 


4491.21 


9.83862 


2.26147 


56.8471 


320 


626.744 


2775.11 


3046.61 


2819.38 


4335.47 


9.78339 


1.59513 


56.2268 


360 


715.389 


2913.18 


3166.28 


2992.4 


4667.89 


8.68811 


2.71944 


60.2336 


400 


821.822 


3150.07 


3427.57 


3197.59 


5033.11 


8.80934 


1.50861 


59.7779 


440 


894.922 


2983.16 


3292.34 


3065.1 


4864.44 


10.3645 


2.7469 


63.0637 


480 


973.078 


3060.02 


3395.98 


3117.88 


4841.14 


10.9789 


1.89069 


58.2062 


520 


1061.74 


3053.21 


3393.28 


3120.32 


4883.03 


11.138 


2.19805 


59.9311 



3.3 Main Observations 

For our main experiment we generated instances of random chordal graphs with 
number of vertices n = 10, 20, 30, , 550. For each value of n, we used values 
of a = 0.1, 0.2, . . . , 0.9. For each of the 55 x 9 (n, a) pairs, we generated 10 
random vertex weighted chordal graphs. We ran each of the three heuristics for 
the two problems and averaged the weight and span of the solutions over the 
10 instances for each (n, a) pairs. Thus each heuristic was evaluated on 4950 
instances, for each problem. The vertex weights are assigned as described above, 
with the maximum weight W fixed at 1000. We first conducted this experiment 
for the max-coloring problem on mode 1 and mode 2 chordal graphs and and 
then repeated them for the interval coloring problem. 

We then generated the same number of instances, but this time assigning to 
each vertex, a weight chosen uniformly from [0, 1000]. We repeated each of the 
three heuristics for the two problems on these randomly generated instances. For 
these instances, we used the maximum weight clique as a lower bound to OPT. 
The initial prototyping was done in the discrete mathematics system Combina- 
torica. However, the final version of the algorithms were written in C-I-+, and 
run on on a desktop running linux release 9. The running time of the programs 
is approximately 106.5 seconds of user time (measured using the Linux time 
command), for all six algorithms on 4950 instances for mode-1 graphs, including 
time to generate random instances. 

Our data is presented in the following tables.^ First we have two tables for 
our experiments for the max-coloring problem. The first two tables (Table 2 and 

^ In each table, we only show representative values due to lack of space. The complete 
data is available at 

http : //www. cs .uiowa.edu/~rraman/ chordalGraphExperiments .html. 
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Table 3. Results of our main experiments on mode 2 random chordal graphs, evalu- 
ating heuristics for the max-coloring problem. 



iv^l \E\ 


OPT 


Best Fit First Fit Partition 


(BF-OPT) 

OPT 


(FF-OPT) 

OPT 


(GPM-OPT) 

OPT 


40 


109.689 


3094.16 


3095.61 


3151.24 


3848.27 


0.0470421 


1.84506 


24.3721 


80 


302.489 


3927.78 


3938.29 


4023.13 


5035 


0.26761 


2.42772 


28.1895 


120 


524.8 


4459.87 


4471.22 


4614.27 


5758.13 


0.254616 


3.46199 


29.11 


160 


799.044 


4891.27 


4903.94 


5058 


6387.23 


0.259192 


3.4088 


30.5844 


200 


1052.07 


5290.47 


5293.46 


5457.12 


6976.49 


0.0564958 


3.15011 


31.8691 


240 


1363.43 


5390.86 


5398.32 


5556.09 


7109.43 


0.138506 


3.06507 


31.8795 


280 


1655.84 


5749.18 


5763.03 


5937.57 


7567.82 


0.241001 


3.2768 


31.6331 


320 


1953.24 


5776.97 


5779.74 


5982 


7655.43 


0.0480837 


3.54915 


32.5165 


360 


2261.96 


5899.47 


5916.26 


6165.27 


7847.38 


0.284583 


4.50549 


33.0184 


400 


2639.39 


5987.34 


5992 


6191.81 


7905.62 


0.0777566 


3.41498 


32.0389 


440 


2956.79 


6161.91 


6167.82 


6390.23 


8192.71 


0.0959298 


3.70538 


32.9573 


480 


3243.42 


6236.88 


6251.6 


6471.51 


8228.43 


0.236051 


3.76203 


31.9319 


520 


3645.54 


6296.03 


6302.76 


6558.19 


8388.33 


0.106769 


4.16382 


33.232 



Table 4. This table shows aggregate performance over all 4950 runs of the three heuris- 
tics for max-coloring, separately for mode 1 and mode 2 graphs. The first three rows 
correspond to the experiments where the value of OPT is known. The next three rows 
correspond to the runs where the weights were assigned randomly. In this case, the 
algorithms are compared against the weight of the maximum weight clique (LB). The 
row “Equals OPT(LB)” lists the number of times each heuristic produces a coloring 
with weight equal to OPT(LB), the row “Equals x” lists the number of times each 
heuristic produces a coloring using minimum number of colors, and the row “% De- 
viation” lists the percentage deviation of the weight of the solution produced from 
OPT(LB), averaged over the 4950 runs. 



MODE 1 MODE 2 

BFM FFM GPM BFM FFM GPM 

Equals OPT 1216 2965 0 4791 1515 2 

Equals x 4950 3393 1 4950 1675 6 

% Deviation 10.70 1.93 58.66 0.20 4.47 39.43 
Random Weights 

Equals LB 53 71 0 38 57 0 

Equals x 4950 2666 3 4950 441 7 

% Deviation 24.64 17.43 78.67 36.20 25.88 59.13 



3) for mode 1 and mode 2 chordal graphs respectively. This is followed by table 
4 that summarizes the performance of the three heuristics for the max-coloring 
problem for both mode 1 and mode 2 chordal graphs over all the runs. After 
this we present three tables (Table 5, 6, and 7) that contains corresponding 
information for the interval coloring problem. The data for the experiments on 
graphs with randomly assigned vertex weights is presented in the appendix. 
Based on all this data, we make 4 observations. 
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Table 5. Results of our main experiments on mode 1 random chordal graphs, evalu- 
ating heuristics for the interval coloring problem. 



|F| \E\ 


OPT 


Best Fit First Fit Partition 


(BFI-OPT) 

OPT 


{FFI-OPT) 

OPT 


{GPI-OPT) 

OPT 


40 


64.4333 


2292.16 


2377.64 


2336.58 


2443.47 


3.72963 


1.93801 


6.60126 


80 


139.178 


2474.74 


2612.68 


2531.07 


2709.23 


5.57364 


2.27588 


9.47528 


120 


223.756 


2605.77 


2803.87 


2680.47 


2839.19 


7.60237 


2.86672 


8.95791 


160 


304.822 


2682.98 


3032.76 


2767.01 


2957.28 


13.0369 


3.13209 


10.2237 


200 


394.556 


2712.4 


3055.42 


2808.32 


2975.33 


12.6464 


3.53643 


9.69375 


240 


451.978 


2835.52 


3215.13 


2926.63 


3106.92 


13.3877 


3.2132 


9.57143 


280 


525.389 


2863.43 


3291.52 


2955.3 


3175.16 


14.9502 


3.20827 


10.8863 


320 


626.744 


2775.11 


3235.78 


2894.63 


3074.89 


16.5999 


4.30693 


10.8024 


360 


715.389 


2913.18 


3374.4 


3026.26 


3317.4 


15.8323 


3.8816 


13.8756 


400 


821.822 


3150.07 


3603.43 


3304.29 


3573.03 


14.3923 


4.89584 


13.4272 


440 


894.922 


2983.16 


3496.91 


3112.83 


3325.01 


17.2219 


4.347 


11.4595 


480 


973.078 


3060.02 


3678.7 


3182.17 


3378.76 


20.2181 


3.99162 


10.416 


520 


1061.74 


3053.21 


3846.96 


3194.8 


3443.17 


25.997 


4.63738 


12.772 


Table 6. Results of 


our main experiments on mode 2 random chordal graphs, evalu- 


ating heuristics for the interval coloring problem. 






|F| \E\ 


OPT 


Best Fit First Fit Partition 


{BFI-OPT) 

OPT 


{FFI-OPT) 

OPT 


{GPI-OPT) 

OPT 


40 


109.689 


3094.16 


3138.77 


3216.93 


3377.4 


1.44179 


3.96805 


9.15418 


80 


302.489 


3927.78 


4074.87 


4175.86 


4441.82 


3.74484 


6.31598 


13.0874 


120 


524.8 


4459.87 


4623.33 


4811.37 


5110.5 


3.66528 


7.8814 


14.5886 


160 


799.044 


4891.27 


5002.19 


5306.62 


5711.9 


2.26776 


8.49178 


16.7775 


200 


1052.07 


5290.47 


5473.74 


5793.81 


6204.62 


3.4643 


9.51418 


17.2793 


240 


1363.43 


5390.86 


5600.83 


5896.01 


6317.99 


3.89507 


9.3706 


17.1983 


280 


1655.84 


5749.18 


5988.46 


6283.7 


6728.99 


4.16195 


9.29737 


17.0426 


320 


1953.24 


5776.97 


6017.8 


6364.53 


6816.09 


4.16885 


10.1709 


17.9873 


360 


2261.96 


5899.47 


6249.91 


6552.94 


7039.04 


5.94027 


11.0769 


19.3166 


400 


2639.39 


5987.34 


6255.47 


6616.7 


7155.96 


4.47815 


10.5114 


19.518 


440 


2956.79 


6161.91 


6446.57 


6807.86 


7327.6 


4.6196 


10.4829 


18.9177 


480 


3243.42 


6236.88 


6569.51 


6833.61 


7423.2 


5.33333 


9.56782 


19.0211 


520 


3645.54 


6296.03 


6573.81 


6980.31 


7555.84 


4.41195 


10.8684 


20.0096 



1. First fit in decreasing order of weights is clearly a heuristic that returns 
solutions very close to OPT for both problems. Overall percentage deviations 
from OPT, each being over 4950 runs are 1.93, 4.47, 2.29 and 9.52 - the first 
two are for max-coloring on mode 1 and mode 2 chordal graphs respectively, 
and the next two are for interval coloring on mode 1 and mode 2 graphs. Even 
in the experiments on graphs with randomly assigned vertex weights, first-fit 
performs better than the other algorithms overall. The average deviations 
being less than 26% from the maximum weight clique for both problems 
(Tables 4, 7 and Tables 8 and 9 in the appendix). Note that in these cases, 
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Table 7. This table shows aggregate performance over all 4950 runs of the three 
heuristics for interval coloring, separately for mode 1 and mode 2 graphs. The first 
three rows correspond to the case where the value of OPT is known. The next three 
rows correspond to the case where the weights were assigned randomly. In the latter 
case, the performance of the algorithms are compared with the weight of the maximum 
weight clique (LB). The row “Equals OPT(LB)” lists the number of times each heuristic 
produces a coloring with weight equal to OPT(LB) and the row “% Deviation” lists the 
percentage deviation of the weight of the solution produced from OPT(LB), averaged 
over the 4950 runs. 



MODE 1 MODE 2 

BFI FFI GPI BFI FFI GPI 

Equals OPT 2878 3368 2044 2459 1353 282 

% Deviation 9.51 2.29 7.89 6.70 9.52 17.45 

Random Weights 

Equals LB 1755 1568 939 467 211 110 

% Deviation 26.87 10.99 16.91 25.21 24.52 30.01 



Max-Color Values for Mode-1 Chordal Graphs Max-Color Values for Mode-2 Chordal Graphs 





Fig. 3. Graphs showing values for max-coloring mode 1 and mode 2 chordal graphs. The 
x-axis corresponds to the number of vertices in the graph, and the y-axis corresponds 
to the max-color value. The solid line shows the value of OPT for the different sizes of 
the graph, and the dashed line corresponds to the value of the coloring produced by 
first-fit; the dotted line, the performance of best-fit; and the dashed-dotted line, the 
performance of the graph partitioning heuristic. 

the percentage deviations are an exaggeration of the actual amount, since 
the maximum weight clique can be quite small compared to OPT. 

2. The graph partitioning heuristic is not competitive at all, relative to first-fit 
or best-fit, in any of the cases, despite the 0(log n)-factor approximation 
guarantee it provides. 

3. Between the first-fit heuristic and the best-fit in reverse perfect elimination 
order heuristic, first-fit seems to do better in an overall sense. However, they 
exhibit opposite trends in going from mode 1 to mode 2 graphs. Specifically, 
first-fit’s performance worsens with the percentage deviation changing as 
1.93 — >■ 4.47 for max-coloring, while best-fit’s performance improves with 
the percentage deviation going from 10.70 — >■ 0.20. Table 4 and the graphs 
in Figure 3, show the performance of the three algorithms on mode 1 and 
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Interval Color Values for Mode 1 Chordal Graphs 




Fig. 4. Graph showing values for interval coloring mode 1 and mode 2 chordal graphs. 
The x-axis corresponds to the number of nodes in the graph, and the y-axis corresponds 
to the interval color value. The solid line corresponds to OPT; the dotted line, to the 
performance of best-fit; the dashed line to the performance of first-fit; and the dashed- 
dotted line, to that of the graph partitioning heuristic. 



mode 2 graphs. We see the same trend for interval coloring as well. The 
performance of first-fit worsens as we go from mode 1 to mode 2 graphs, 
while best-fit’s performance improves. Table 7 and the graphs in Figure 4 
show the deviations from OPT for interval coloring. In order to verify this 
trend, we tested the algorithms on mode 1 and mode 2 graphs, with randomly 
assigned vertex weights. First-fit’s performance continues to show this trend 
of deteriorating performance in going from mode 1 to mode 2 graphs for 
both max-coloring and interval coloring. However the performance of best- 
fit is quite different. It’s performance on max-coloring worsens as we go from 
mode 1 to mode 2 graphs, while it improves slightly for interval coloring. 

4. Best-fit heuristic seems to be at a disadvantage because it is constrained to 
use as many colors as the chromatic number. First-fit uses more colors than 
the chromatic number a fair number of times. Examining Table 4 we note 
that for max-coloring, first-fit uses more colors than OPT about 31.45% and 
66.16% of the time for mode 1 and mode 2 graphs respectively. 

4 Conclusion 

Our goal was to evaluate the performance of three simple heuristics for the max- 
coloring problem and for the interval coloring problem. These heuristics were 
first-fit, best-fit, and a heuristic based on graph partitioning. First-fit outper- 
formed the other algorithms in general, and our recommendation is that this 
be the default heuristic for both problems. Despite the logarithmic approxi- 
mation guarantee it provides, the heuristic based on graph partitioning is not 
competitive in comparison to first-fit or best-fit. Best-fit seems to perform bet- 
ter on graphs that are more “irregular” and offers first-fit competition for such 
graphs. We have also experimented with other classes of chordal graphs such as 
trees and sets of disjoint cliques. Results from these experiments are available 
at http : / /www . cs . uiowa . edu/ ^rramian/ chordalGraphExperiments . html 
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Appendix 

Max-Coloring Mode-1 and Mode-2 Graphs with Random Weights 



Table 8. Results of our experiments on mode 1 random chordal graphs, with ran- 
dom weights evaluating heuristics for the max-coloring problem. Here LB refers to the 
maximum weight clique. 



|H| \E\ 


LB 


Best Fit First Fit Partition 


(BF-OPT) 

OPT 


(FF-OPT) 

OPT 


(GPM-OPT) 

OPT 


80 


144.656 


3057.64 


3759.9 


3499.66 


4668.01 


22.9672 


14.4559 


52.6669 


120 


217.022 


3177.24 


3946.64 


3668.24 


5015.84 


24.216 


15.4536 


57.8678 


160 


310.744 


3263.64 


4123.96 


3863.77 


5306.64 


26.3604 


18.3881 


62.5987 


200 


369.5 


3436.68 


4236.58 


4049.02 


5544.8 


23.2754 


17.8179 


61.3419 


240 


465.822 


3555.23 


4466.49 


4171.19 


5732.43 


25.6314 


17.3253 


61.2393 


280 


543.056 


3670.51 


4664.58 


4319.58 


5981.14 


27.0825 


17.6833 


62.9513 


320 


616.833 


3683.39 


4599.39 


4257.07 


5971.09 


24.8684 


15.5747 


62.1086 


360 


725.544 


3675.27 


4721.42 


4376.39 


6133.7 


28.4648 


19.0768 


66.8913 


400 


799.322 


3718.68 


4700.19 


4412.16 


6158.34 


26.3941 


18.6485 


65.6058 


440 


853.956 


3766.96 


4733.96 


4428.67 


6248.27 


25.6706 


17.5662 


65.8705 


480 


953.733 


3708.09 


4750.59 


4444.38 


6195.4 


28.1142 


19.8563 


67.078 



Table 9. Results of our main experiments on mode 2 random chordal graphs, with 
random weights evaluating heuristics for the max-coloring problem. 


|R| \E\ 


LB 


Best Fit First Fit Partition ^ 


{FFl-OPT) 

OPT 


(OPJ-OPT) 

OPT 


80 300.522 


4545.92 


5842.42 


5302.06 


6248.58 


28.5201 


16.6332 


37.4546 


120 534.711 


5007.94 


6674.97 


6027.22 


7103.28 


33.2876 


20.3532 


41.8402 


160 799.078 


5477.86 


7339.93 


6594.4 


7700.49 


33.9928 


20.3829 


40.5749 


200 1032.77 


5613.7 


7683.96 


6792.93 


8030.8 


36.8786 


21.0063 


43.0572 


240 1351.2 


5877.72 


8096.71 


7077.89 


8455.37 


37.7525 


20.4189 


43.8545 


280 1670.63 


6349.47 


8686.23 


7656.33 


9033.13 


36.8026 


20.5823 


42.266 


320 1977.26 


6538.11 


8888.31 


7837.77 


9316.49 


35.9462 


19.8782 


42.4951 


360 2259.06 


6500.41 


9195.9 


7919.62 


9315.26 


41.4664 


21.8326 


43.3026 


400 2579.76 


6623.37 


9270.36 


8096.2 


9643.78 


39.9644 


22.2369 


45.6024 


440 2991.97 


6919.62 


9724.36 


8390.56 


9996.6 


40.533 


21.2574 


44.4674 


480 3315.63 


7062.68 


9891.43 


8462.93 


10052.4 


40.0522 


19.8261 


42.3314 
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Interval Coloring Mode-1 and Mode-2 Graphs with Random Weights 



Table 10. Results of our main experiments on mode 1 random chordal graphs, with 
random weights evaluating heuristics for the interval coloring problem. 



|R| \E\ 


LB 


Best Fit First Fit Partition 


(BFl-OPT) 

OPT 


(FFI-OPT) 

OPT 


(aPJ-OPT) 

OPT 


80 


138.911 


3025.34 


3730.73 


3361.86 


3572.19 


23.316 


11.1231 


18.0754 


120 


221.489 


3246.92 


4213.96 


3573.64 


3803.58 


29.7831 


10.0625 


17.1441 


160 


304.589 


3349.5 


4345.77 


3773.23 


4031.22 


29.7437 


12.6506 


20.353 


200 


388.922 


3487.59 


4702.46 


3960.63 


4250.49 


34.834 


13.5637 


21.8747 


240 


465.622 


3564.61 


4774.09 


3985.03 


4317.4 


33.9301 


11.7943 


21.1184 


280 


547.156 


3621.37 


4935.97 


4079.52 


4346.26 


36.3012 


12.6515 


20.017 


320 


646.844 


3632.78 


5053.3 


4158.73 


4420.82 


39.1029 


14.4781 


21.6926 


360 


708.389 


3636.84 


5215.73 


4140.89 


4413.28 


43.4137 


13.8594 


21.3491 


400 


800.833 


3790.56 


5389 


4357.7 


4544.91 


42.1691 


14.962 


19.9009 


440 


890.633 


3751.67 


5564.04 


4290.47 


4549.29 


48.3086 


14.3616 


21.2605 


480 


972.989 


3770.38 


5594.34 


4365.38 


4627.62 


48.3762 


15.7809 


22.7363 


520 


1034.57 


3785.07 


5645.89 


4345.94 


4671.27 


49.1622 


14.8182 


23.4131 



Table 11. Results of our main experiments on mode 2 random chordal graphs with 
random weights, evaluating heuristics for the interval coloring problem. 


IV^I \E\ 


LB 


Best Fit First Fit Partition 


{BFI-OPT) 

OPT 


{FFI-OPT) 

OPT 


{GPI-OPT) 

OPT 


80 305.778 


4599.7 


5216.71 


5446.9 


5729.69 


13.4142 


18.4186 


24.5666 


120 534.244 


5080.24 


5892.5 


6148.76 


6414.76 


15.9885 


21.0327 


26.2686 


160 791.522 


5379.14 


6460.83 


6590.64 


6915.19 


20.1089 


22.5222 


28.5556 


200 1076.37 


5854.59 


6830.21 


7224.81 


7488.83 


16.6642 


23.4042 


27.9139 


240 1338.13 


5969.88 


7236.98 


7329.67 


7760.87 


21.2249 


22.7775 


30.0004 


280 1685.17 


6280.87 


7541.18 


7765.64 


8109.21 


20.0659 


23.6397 


29.1097 


320 1911.62 


6281.07 


7692.9 


7842.99 


8220.39 


22.4776 


24.8671 


30.8757 


360 2308.61 


6597.4 


8010.04 


8071.09 


8564.29 


21.4121 


22.3374 


29.8131 


400 2584.28 


6800 


8210.2 


8375.16 


8740.73 


20.7382 


23.1641 


28.5402 


440 2958.31 


6940.89 


8431.81 


8543.13 


9072.14 


21.4803 


23.0841 


30.7058 


480 3251.54 


7010.97 


8681.73 


8574.38 


9276.71 


23.8308 


22.2995 


32.3171 


520 3625.52 


7366.5 


9047.21 


9139.46 


9543.09 


22.8156 


24.0678 


29.5471 
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Abstract. This paper deals with heuristic algorithm characterization, 
which is applied to the solution of an NP-hard problem, in order to select 
the best algorithm for solving a given problem instance. The traditional 
approach for selecting algorithms compares their performance using an 
instance set, and concludes that one outperforms the other. Another 
common approach consists of developing mathematical models to relate 
performance to problem size. Recent approaches try to incorporate more 
characteristics. However, they do not identify the characteristics that af- 
fect performance in a critical way, and do not incorporate them explicitly 
in their performance model. In contrast, we propose a systematic proce- 
dure to create models that incorporate critical characteristics, aiming at 
the selection of the best algorithm for solving a given instance. To vali- 
date our approach we carried out experiments using an extensive test set. 
In particular, for the classical bin packing problem, we developed models 
that incorporate the interrelation among five critical characteristics and 
the performance of seven heuristic algorithms. As a result of applying 
our procedure, we obtained a 76% accuracy in the selection of the best 
algorithm. 



1 Introduction 

For the solution of NP-hard combinatorial optimization problems, non- 
deterministic algorithms have been proposed as a good alternative for very large 
instances [1]. On the other hand, deterministic algorithms are considered ade- 
quate for small instances of these problems [2]. As a result, many deterministic 
and non-deterministic algorithms have been devised for NP-hard optimization 

* This research was supported in part by CONACYT and COSNET. 
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problems. However, no adequate method is known nowadays for selecting the 
most appropriate algorithm to solve them. 

The problem of choosing the best algorithm for a particular instance is far 
away from being easily solved due to many issues. Particularly, it is known that 
in real-life situations no algorithm outperforms the other in all circumstances [3] . 
But until now theoretical research has suggested that problem instances can be 
grouped in classes and there exists an algorithm for each class that solves the 
problems of that class most efficiently [4] . Consequently, few researches have tried 
to identify the algorithm dominance regions considering more than one problem 
characteristic. However, they do not identify systematically the characteristics 
that affect performance in a critical way and do not incorporate them explicitly 
in a performance model. 

For several years we have been working on the problem of data-object dis- 
tribution on the Internet [5], which can be seen as a generalization of the bin 
packing problem. We have designed solution algorithms and carried out a large 
number of experiments with them. As expected, no algorithm showed absolute 
superiority; hence our interest in developing an automatic method for algorithm 
selection. For this purpose, we propose a procedure for the systematic charac- 
terization of algorithm performance. 

The proposed procedure in this paper consists of four main phases: modeling 
problem characteristics, grouping instance classes dominated by an algorithm, 
modeling the relationship between the characteristics of the grouped instances 
and the algorithm performance, and applying the relationship model to algo- 
rithm selection for a given instance. 

This paper is organized as follows. An overview of the main works on algo- 
rithm selection is presented in Section 2. Then, Section 3 describes a general 
mechanism to characterize algorithm performance and select the algorithm with 
the best expected performance. An application problem and its solution algo- 
rithms are described in Section 4, in particular we use the bin packing (BP) 
problem and seven heuristic algorithms (deterministic and non-deterministic). 
Details of the application of our characterization mechanism to the solution of 
BP instances are described in Section 5. 

2 Related Work 

Recent approach for modeling algorithm performance tries to incorporate more 
than one problem characteristic in order to obtain a better problem representa- 
tion. The objective of this is to increase the precision of the algorithm selection 
process. The works described below follow this approach. 

Borghetti developed a method to correlate each instance characteristic to al- 
gorithm performance [6]. The problem dealt with was reasoning over a Bayesian 
Knowledge Base (BKB), which was solved with a genetic algorithm (GA) and 
a best first search algorithm (BFS). For the BKB problem, two kinds of criti- 
cal characteristics were identified: topological and probabilistic. In this case the 
countable topological characteristics are number of nodes, number of arcs and 
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number of random variables. An important shortcoming of this method is that 
it does not consider the combined effect of all the characteristics. 

Minton proposed a method that allows specializing generic algorithms for 
particular applications [7]. The input consists of a description of the problem 
and a training instance set, which guides the search through the design space, 
constituted by heuristics that contend for their incorporation into the generic 
algorithm. The output is a program adjusted to the problem and the distribution 
of the instances. 

Fink developed an algorithm selection technique for decision problems, which 
is based on the estimation of the algorithm gain, obtained from the statistical 
analysis of their previous performance [8]. Although the estimation can be en- 
riched with new experiences, its efficiency depends on the user’s ability to define 
groups of similar problem instances and to provide an appropriate metric of the 
problem size. The relationship among the problem characteristics given by the 
user and the algorithm performance is not defined formally. 

The METAL group proposed a method to select the most appropriate clas- 
sification algorithm for a set of similar instances [9] . They identify groups of old 
instances that exhibit similar characteristics to those of a new instance group. 
The algorithm performance of old instances is known and is used to predict 
the best algorithms for the new instance group. The similarity among instance 
groups is obtained considering three types of problem characteristics: general, 
statistics and derived from information theory. Since they do not propose a model 
for relating the problem characteristics to performance, the identification process 
of similar instances is repeated with each new group of data, so the processing 
time for algorithm selection can be high. 

Rice introduced the poly-algorithms concept [10]. This paper refers to the 
use of a function that allows selecting, from an algorithms set, the best one 
for solving a given situation. After this work, other researchers have formulated 
different functions, for example those presented in [11,12]. In contrast with these 
works, in our solution approach we integrate three aspects: 1) self-adaptation of 
functions to incorporate new knowledge; 2) systematic method with statistical 
foundation to obtain the most appropriate functions; and 3) modeling of the 
interrelation of critical variables for algorithmic performance. 

Table 1 presents the most important works that consider several problem 
characteristics. Column 2 indicates if problem characteristics are modeled. Col- 
umn 3 is used to indicate if problem characteristics are incorporated explicitly 
into a performance model. Column 4 shows the granularity of the prediction, 
i.e., if the prediction can be applied for selecting the best algorithm for only one 
instance. Finally, column 5 indicates if the prediction model has been applied to 
algorithm selection. 

Notice from Table 1, that no work includes all aspects required to charac- 
terize algorithm performance aiming at selecting the best algorithm for a given 
instance. The use of instance characterization to integrate groups of similar in- 
stances is an emerging and promising area for identifying dominance regions of 
algorithms. The works of Borghetti and the METAL group have made impor- 
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tant advances in this area. However, the first one does not combine the identified 
characteristics, hindering the selection of algorithms; and the second is not ap- 
plicable to the solution of only one instance. In contrast, our method is the only 
one that considers the four main aspects of algorithm characterization. Next 
section describes the method in detail. 



Table 1. Related work on algorithm performance characterization 



Research 


Problem 

characteristics 

modeling 


Characteristics into 
performance model 


Performance 
prediction for 
one instance 


Selection 


Fink 










Borghetti 










METAL 










Our proposal 











3 Automation of Algorithm Selection 

In this section a statistical method is presented for characterizing algorithm 
performance. This characterization is used to select the best algorithm for a 
specific instance. Section 3.1 describes the general software architecture, which 
is based on the procedure described in Section 3.2. 



3.1 Architecture of the Characterization and Selection Process 

The software architecture proposed for performance characterization and its ap- 
plication to algorithm selection is shown in Figure 1 and consists of five basic 
modules: 1) Statistical Sampling, 2) Instance Solution, 3) Characteristics Mod- 
eling, 4) Clustering Method, and 5) Classification Method. 

Initially, the Statistical Sampling module generates a set of representative 
instances of the optimization problem. This set grows each time a new instance 
is solved. 

The problem instances generated by the Statistical Sampling module are 
solved by the Instance Solution module, which has a configurable set of heuristic 
algorithms. For each instance, the performance statistics of each algorithm are 
obtained. The usual metrics for quantifying the final performance are the fol- 
lowing: the percentage of deviation of the optimal solution value, the processing 
time and their corresponding standard deviations. 

In the Characteristics Modeling module, the parameters values of each in- 
stance are transformed into indicators of the problem critical characteristics; i.e., 
those that impact the algorithms performance. 
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The Clustering module integrates groups constituted by the instances for 
which an algorithm had a performance similar or better than the others. Each 
group defines a region dominated by an algorithm. The similarity among the 
members of each group is determined through the instance characteristics and 
the algorithms performance. 

The Classification module relates a new instance to one of the previously 
created groups. It is expected that the heuristic algorithm associated to the 
group obtains the best performance when solving the given instance. The new 
solved instances are incorporated to the characterization process for increasing 
the selection quality. 




► frodbAck 



Fig. 1. Architecture of the characterization and selection process 



3.2 Procedure to Systematize Algorithm Selection 

We propose a procedure for systematizing the creation of mathematical models 
that incorporate problem characteristics aiming at selecting the best algorithm 
for a given instance. This procedure includes the steps needed to develop the 
architecture of the characterization and selection process, described before (see 
Figure 1). In Figure 2 the steps of the proposed procedure are associated to the 
architecture. 
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Step 1. Representative Sampling. Develop a sampling method to generate 
representative problem instances. This instances base will be used to determine 
the relationship between instance characteristics and algorithm performance. 

Step 2. Instance Sample Solution. Provide a set of solution algorithms. Each 
instance of the sample must be solved with each available algorithm. The average 
performance of each algorithm for each instance must be calculated carrying out 
30 experiments. Examples of performance measures are: the ratio of the best 
solution value with respect to a lower bound of the optimal solution value, and 
processing time. 




Fig. 2. Procedure to systematize algorithm selection 



Step 3. Performance Evaluation. Develop a method to evaluate the perfor- 
mance of the solution algorithms, and determine the best algorithm for each 
sample instance. An alternative for considering all the different performance 
metrics is using their weighted average. Another is choosing the algorithm with 
the best quality and, in case of tie, choosing the fastest algorithm. 

Step 4- Exploratory Problem Analysis. Establish hypothesis about the possi- 
ble critical variables. These variables represent the instance characteristics that 
can be used as good indicators of algorithm performance on these instances. 

Step 5. Indicators Eormulation and Measurement. Develop indicator func- 
tions. These metrics are established to measure the values of the critical charac- 
teristics of the instances. The indicators are obtained using the instance parame- 
ter values in the indicator functions. Both, specific features of the instances and 
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algorithm performance information, affect the algorithm selection strategy. In 
this step, a way to extract relevant features from the problem parameters must 
be found. 

Step 6. Creation of Dominance Groups. Develop a method to create instance 
groups dominated, each one, by an algorithm or an algorithm set. The similarity 
among members of each group is determined through: indicators of instance 
characteristics and the algorithm with the best performance for each one. The 
output of this method is a group set, where each group is associated with an 
instance set and the algorithm with the best performance for the set. 

Step 1. Performance Modeling. Develop a method to model the relationship 
between problem characteristics and algorithm performance. The relationship is 
learned from the groups created in Step 6. 

Step 8. Algorithm Selection. Develop a method to use the performance model. 
For each new instance, its characteristic indicators must be calculated in order 
to determine which group it belongs to, using the relationship learned. The 
algorithm associated to this group is the expected best algorithm for the new 
instance. 

Step 9. Feedback. The results of the new instance, solved with all algorithms, 
are used to feedback the procedure. If the prediction is right, the corresponding 
group is reinforced, otherwise a classifying adjustment is needed. 



4 Application Problem 

The bin packing problem is used for exemplifying our algorithm selection 
methodology. In this section a brief description of the one-dimensional bin pack- 
ing problem and its solution algorithms is made. 



4.1 Problem Description 

The Bin Packing problem is an NP-hard combinatorial optimization problem, in 
which there is a given sequence of n items L = {oi, 02 , ..., o„} each one with a 
given size 0 < s^af) < c, and an unlimited number of bins each of capacity c. The 
question is to determine the smallest number of bins m into which the objects 
can be packed. In formal words, determine an L minimal partition Bi, B 2 , ..., 
such that in each bin B^ the aggregate size of all the items in Bi does not exceed 
c. This constraint is expressed in (1). 

^ s(oi) < c Vj, 1 < j < TO (1) 

aiSBj 



In this work, we consider the discrete version of the one-dimensional bin 
packing problem, in which the bin capacity is an integer c, the number of items is 
n, and for simplicity, each item size is Si, which is chosen from the set {1, 2, . . . , c}. 
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4.2 Heuristic Solution Algorithms 

An optimal solution can be found by considering all the ways to partition a 
set of n items into n or fewer subsets, unfortunately the number of possible 
partitions is larger than (n/2)"/^ [13]. The heuristic algorithms presented in 
this section use deterministic and non-deterministic strategies for obtaining 
suboptimal solutions with less computational effort. 

Deterministic Algorithms. These algorithms always follow the same 
path to arrive at the solution. For this reason, they obtain the same solution 
in different executions. The approximation deterministic algorithms for bin 
packing are very simple and run fast. A theoretical analysis of approximation 
algorithms is presented in [14,15,16]. In these surveys, the most important 
results for the one-dimensional bin packing problem and variants are discussed. 

First Fit Decreasing (FFD). With this algorithm the items are first placed 
in a list sorted in non-increasing weight order. Then each item is picked orderly 
from the list and placed into the first bin that has enough unused capacity to 
hold it. If no partially filled bin has enough unused capacity, the item is placed 
in empty bin. 

Best Fit Decreasing (BFD). The only difference with FFD is that the items 
are not placed in the first bin that can hold them, but in the best-filled bin that 
can hold them. 

Match to First Fit (MFF). It is a variation of FFD. It asks the user to type 
the number of complementary bins. Each of these auxiliary bins is intended for 
holding items in a unique range of sizes. As the list is processed, each item is 
examined to check if it can be packed in a new bin, with items of a proper 
complementary bin; or packed in a partially-filled bin; or packed alone in a 
complementary bin. Finally, the items that are in the complementary bins are 
packed according to the basic algorithm. 

Match to Best Fit (MBF). It is a variation of BFD and similar to MFF, 
except for the basic algorithm used. 

Modified Best Fit Decreasing (MBFD). The algorithm asks for a percentage 
value. This is the amount of bin capacity that can be left empty and qualify as 
a ’’good fit”. All the items over 50% of the bin capacity are placed definitely 
in their own bin. With each partially filled bin, a special procedure to find a 
’’good fit” item combination is followed. Finally, all remaining items are packed 
according to BFD. 

Non-Deterministic Algorithms. These algorithms generally do not ob- 
tain the same solution in different executions. Approximation non-deterministic 
algorithms are considered general purpose algorithms. 

Ant Colony Optimization (AGO). It is inspired on the ability of real ants to 
find the shortest path between their nest and a food source using a pheromone 
trail. For every ant build an items list partition starting with an empty bin. 
Each new bin is filled with ’’selected items” until no remaining item fits in 
it. A ’’selected item” is chosen stochastically using mainly a pheromone trail. 
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which indicates the advantage of having a new item of size j with the item sizes 
already packed. The pheromone trail evaporates a little after each iteration and 
is reinforced by good solutions [17]. 

Threshold Accepting (TA). In this algorithm, to each x G X (where X rep- 
resents the set of all feasible solutions) a neighborhood H{x) C X is asso- 
ciated. Thus, given a current feasible solution x, and a control parameter T 
(called temperature), a neighboring feasible solution y € H{x) is generated; if 
{z{y) — z{x)) < T, then y is accepted as the new current solution, otherwise 
the current solution remains unchanged. The value of T is decreased each time 
thermal equilibrium is reached. This condition is verified when a set S of feasible 
solution is formed. The value of T is reduced, by repeatedly multiplying it by a 
cooling factor y < 1 until the system is frozen [18]. 

5 Implementation 

This section shows the application of our procedure for algorithm characteriza- 
tion and selection. The procedure was applied to the one-dimensional bin packing 
problem (BP) and seven heuristic algorithms to solve it. 



5.1 Statistical Sampling 

In order to ensure that all problem characteristics were represented in the in- 
stances sample, stratified sampling and a sample size derived from survey sam- 
pling were used. The formation of strata is a technique that allows reducing the 
variability of the results, increasing the representativeness of the sample, and 
can help ensure consistency especially in handling clustered data [19]. 

Specifically, the following procedure was used: calculation of the sample size, 
creation of strata, calculation of the number of instances for each stratum, and 
random generation of the instances for each stratum. With this method 2,430 
random instances were generated. 

The task of solving 2,430 random instances requires a great amount of time: 
actually it took five days with four workstations. However, it is important to 
point out that this investment of time is only necessary once, at the beginning 
of the process of algorithms characterization, to create a sample of minimum 
size. We consider that this is a reasonable time for generating an initial sample 
whose size, validated statistically, increases the confidence level in the results. 
Besides, the initial quality of the prediction of the best algorithm for a particular 
instance can be increased through feedback. 

5.2 Instance Solution 

In order to learn the relationship between algorithm performance and problem 
characteristics, the random instances (which were used for training purpose) 
were solved. For testing the learned performance model, standard instances that 
are accepted by the research community were solved. For most of them, the 
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optimal solution is known; otherwise the best-known solution is available. The 
experimental performances obtained for standard instances were used to validate 
the performances predicted by our model. Additionally, we confirmed the quality 
of our heuristic algorithm with the known solution. 

2,430 random instances were generated using the method described in Section 
5.1, and 1369 standard instances were considered. These instances were solved 
with the seven heuristic algorithms described in Section 4. The performance 
results obtained were: execution time, theoretical ratio and their corresponding 
standard deviation. Theoretical ratio is one of the usual performance metrics for 
bin packing and it is the ratio between the obtained solution and the theoretical 
optimum (it is a lower bound of the optimal value and equals the sum of all the 
item sizes divided by the bin capacity) . 

For each sample instance, all the algorithms were evaluated in order to deter- 
mine the best algorithm. For a given instance, the algorithm with the smallest 
performance value was chosen, assigning the largest priority to the theoretical 
ratio. 



5.3 Characteristics Modeling 

In this step relevant features of the problem parameters were identified. After- 
wards, expressions to measure the values of identified critical characteristics were 
derived. 

In particular, four critical characteristics that affect algorithm performance 
were identified. The critical characteristics identified using the most common 
recommendation were instance size and item size dispersion. The critical char- 
acteristics identified using parametric analysis were capacity constraint and bin 
usage. 

Once the critical characteristics were identified, expressions to measure their 
influence on algorithm performance were derived. Expressions (2) through (6) 
show five indicators derived from the analysis of the critical variables. 

Instance size. The p indicator in (2) expresses a relationship between instance 
size and the maximum size solved. The instance size is the number of items n, 
and the maximum size solved is maxn. The value of maxn was set to 1000, 
which corresponds to the number of items of the largest instance solved in the 
specialized literature that we identified. 

Capacity constraint. The t indicator in (3) expresses a relationship between 
the average item size and the bin size. The size of item i is Sj and the bin size is 
c. This metric quantifies the proportion of the bin c that is occupied by an item 
of average size. 

Item size dispersion. Two indicators were derived for this variable. The d 
indicator in (4) expresses the dispersion degree of the item size values. It is 
measured using the standard deviation of t. The / indicator in (5) expresses the 
proportion of items whose sizes are factors of the bin capacity. In other words, 
an item is a factor when the bin capacity c is multiple of its corresponding item 
size Si. Instances with many factors are considered easy to solve. 
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Bin usage. The objective function has only one term, from which the b in- 
dicator was derived. The b indicator is shown in expression (6). This indicator 
expresses the proportion of the total size that can fit in a bin of capacity c. The 
inverse of this metric is used to calculate the theoretical optimum. 



n 

P = 

maxn 



(2) 



n 



1 <i <n 



( 3 ) 



d = a{t) 



( 4 ) 



^ ^ E^ f actor {c, Si) 
n 



1 <i <n 



( 5 ) 



b = 



c 



1 <i <n 



(6) 



The factor analysis technique was used to confirm if derived indicators were 
critical too. Table 2 shows the characteristic indicators and the best algorithm 
for a small instance set, which were selected from a sample with 2,430 random 
instances. 



5.4 Clustering 

K-means was used as a clustering method to create instance groups dominated, 
each one, by an algorithm. The cluster analysis was carried out using the com- 
mercial software SPSS version 11.5 for Windows. The similarity among members 
of each group was determined through: characteristics indicators of the instances 
and the algorithm with the best performance for each one (see Table 2). Five 
groups were obtained; each group was associated with an instance set and an 
algorithm with the best performance for it. Two algorithms had poor perfor- 
mance and were outperformed by the other five algorithms. This dominance 
result applies only to the instance space explored in this work. 



5.5 Classification 

In this investigation discriminant analysis was used as a machine learning method 
to find out the relationship between the problem characteristics and algorithm 
performance. 

The discriminant analysis extracts from data of group members, a group 
classification criterion named discriminant functions, which will be used later 
for classifying each new observation in the corresponding group. The percentage 
of new correctly classified observations is an indicator of the effectiveness of the 
discriminant functions. If these functions are effective on the training sample, it 
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Table 2. Example of random intances with their characteristics and the best algorithm 



Instance 


Characteristics indicators 


Best 

algorithm 


Problem 

size 

P 


Bin size 
b 


Item size 

t 


Factors 

/ 


Item 

dispersion 

d 


Elil0.txt 


0.078 


0.427 


0.029 


0.000 


0.003 


FED 


E50il0.txt 


0.556 


0.003 


0.679 


0.048 


0.199 


AGO 


E147il0.txt 


0.900 


0.002 


0.530 


0.000 


0.033 


TA 


E162il0.txt 


0.687 


0.001 


0.730 


0.145 


0.209 


BFD 


E236il0.txt 


0.917 


0.002 


0.709 


0.000 


0.111 


TA 



is expected that with new observations whose corresponding group is unknown, 
they will classify well. 

The analysis was made using the commercial software SAS version 4.10 for 
Windows. For obtaining the classification criterion, five indicators (see section 
5.3) were used as independent variable, and the number of the best algorithm as 
dependent variable (or class variable). The discriminant classifier was trained 
with 2,430 bin packing instances generated with the procedure described in 
section 5.1, and validated with a resubstitution method, which uses the same 
training instances. 



Table 3. Clasification results with 2,430 random instances 



Origin Group 


Target Group 


Total 


FFD 


BFD 


MBF 


TA 


AGO 


FFD 


600 

(40.9%) 


190 

(13.0%) 


374 

(25.5%) 


59 

(4.0%) 


243 

(16.6%) 


1466 

(100%) 


BFD 


1 

(4.8%) 


19 

(90.5%) 


0 

(0.0%) 


0 

(0.0%) 


1 

(4.8%) 


21 

(100%) 


MBF 


0 

(0.0%) 


1 

(4.4%) 


20 

(87.0%) 


0 

(0.0%) 


2 

(8.7%) 


23 

(100%) 


TA 


0 

(0.0%) 


0 

(0.0%) 


0 

(0.0%) 


242 

(100%) 


0 

(0.0%) 


242 

(100%) 


AGO 


81 

(11.9%) 


73 

(10.8%) 


103 

(15.2%) 


2 

(0.3%) 


419 

(61.8%) 


678 

(100%) 


Error Rate 


0.59 


0.09 


0.13 


0.00 


0.38 


0.24 



Table 3 presents the validation results of the obtained classifier, which are 
similar to those generated by SPSS. This table shows the relationship between 
two groups: number of instances than belong to the origin group and were classi- 
fied into the target group, the corresponding percentage is included. In this case, 
each algorithm defines a group. On column 7, the total number of instances that 
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belong to the origin group is shown. The final row shows the error rate for each 
target group, which is the proportion of misclassified instances. The average 
error was 24%. 



5.6 Selection 

To validate the effectiveness of the discriminant classifier mentioned in Section 
5.5, we considered four types of standard bin packing instances with known so- 
lution (optimal or the best know) . The Beasley’s OR-Library contains two types 
of bin packing problems: u instances, t instances. The Operational Research Li- 
brary contains problems of two kinds: N instances and hard instances. All of 
these instances are thoroughly described in [20]. 

Table 4 presents a fraction of the 1,369 instances collected. For each instance, 
the indicators and best algorithm are shown, and they were obtained as explained 
in Sections 5.2 and 5.3. These results were used for testing the classifier trained 
with random instances. The classifier predicted the right algorithm for 76% of 
the standard instances. Notice the consistency of this result with the error shown 
in Table 3. 



Table 4. Example of standard intances with its characteristics and the best algorithm 



Instance 


Characteristics indicators 


Best 

algorithm 


Problem 
size p 


Bin size 
b 


Item size 
t 


Factors 

/ 


Item 

dispersion 

d 


Hard0.txt 


0.200 


0.183 


0.272 


0.000 


0.042 


AGO 


N3clwl_t.txt 


0.200 


0.010 


0.499 


0.075 


0.306 


FFD 


T60_19.txt 


0.060 


0.050 


0.333 


0.016 


0.075 


MBF 


U1000_19.txt 


1.000 


0.003 


0.399 


0.054 


0.155 


AGO 


Nlw2b2r0.txt 


0.050 


0.102 


0.195 


0.020 


0.057 


MBF 



6 Conclusions and Future Work 

In this article, we propose a new approach to solve the selection algorithm prob- 
lem in an innovative way. The main contribution is a systematic procedure to 
create mathematical models that relate algorithm performance to problem char- 
acteristics, aiming at selecting the best algorithm to solve a specific instance. 
With this approach it is possible to incorporate more than one characteristic 
into the models of algorithm performance, and get a better problem representa- 
tion than other approaches. 

For test purposes 2,430 random instances of the bin packing problem were 
generated. They were solved using seven different algorithms and were used for 
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training the algorithm selection system. Afterwards, for validating the system, 
1,369 standard instances were collected, which have been used by the research 
community. The experimental results showed an accuracy of 76% in the selection 
of the best algorithm for all standard instances. Since a direct comparison can 
not be made versus the methods mentioned in Section 2.2, this accuracy has to 
be compared with that of a random selection from the seven algorithms: 14.2%. 
For the instances of the remaining percentage, the selected algorithms generate 
a solution close to the optimal. 

An additional contribution is the systematic identification of five character- 
istics that most influence algorithm performance. The detection of these critical 
characteristics for the bin packing problem was crucial for obtaining the results 
accuracy. We consider that the principles followed in this research can be applied 
for identifying critical characteristics of other NP-hard problems. 

Currently, the proposed procedure is being tested for solving a design model 
of Distributed Data-objects (DD) on the Internet, which can be seen as a gener- 
alization of bin packing (BP). Our previous work shows that DD is much heavier 
than BP. 

For future work we are planning to consolidate the software system based on 
our procedure and incorporate an adaptive module that includes new knowledge 
generated from the exploitation of the system with new instances. The objective 
is to keep the system in continuous self-training. In particular, we are interested 
in working with real-life instances of the design model of Distributed Data- 
objects (DD) on the Internet and incorporate new problems as they are being 
solved. 
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Abstract. We introduce a metaheuristic framework for combinatorial 
optimization. Our framework is similar to many existing frameworks (e.g. 
[27]) in that it is modular enough that important components can be 
independently developed to create optimizers for a wide range of prob- 
lems. Ours is different in many aspects. Among them are its combinato- 
rial emphasis and the use of simulated annealing and incremental greedy 
heuristics. We describe several annealing schedules and a hybrid strategy 
combining incremental greedy and simulated annealing heuristics. Our 
experiments show that (1) a particular annealing schedule is best on av- 
erage and (2) the hybrid strategy on average outperforms each individual 
search strategy. Additionally, our framework guarantees the feasibility of 
returned solutions for combinatorial problems that permit infeasible so- 
lutions. We, further, discuss a generic method of optimizing efficiently 
bottle-neck problems under the local-search framework. 



1 Introduction 

Combinatorial optimization is important in many practical applications. Un- 
fortunately, most combinatorial optimization problems are usually found to be 
NP-hard and thus impractical to solve optimally when their sizes get large. Fur- 
ther, provable approximation algorithms for them and especially their variants 
need to be designed and implemented carefully and specifically for each applica- 
tion. This approach is sometime not affordable and consequently metaheuristics 
such as simulated annealing, genetic algorithms, tabu search, etc. become more 
attractive due to the relative ease with which they can be adapted to problems 
with complicated and application-specific constraints. Systems and methodolo- 
gies [8,9,27] based on local-search heuristics have been developed to provide 
flexible frameworks for creating optimizers. 

In this paper, we report an improved version of Discropt (first proposed 
in [23]), a general-purpose metaheuristic optimizer. It is designed based on the 
local-search model, implemented using C-I-+ in such a way that optimizers can be 
constructed with a minimal user-effort by putting built-in components together 
and/or modifying them appropriately. Discropt is different from other local- 
search frameworks [8,27] in a number of aspects in terms of built-in features. 
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choice and implementation of search strategies. First, it supports as built-ins 

(1) three important solution types, namely permutation, subset, and set parti- 
tion, which can be used to model many combinatorial problems, and (2) two 
fundamentally different search approaches, simulated annealing and incremen- 
tal greedy construction. Second, for problems which permit infeasible solutions 
(e.g. vertex coloring, vertex cover, TSP on incomplete graphs), Discropt guaran- 
tees the feasibility of returned solutions in a non-trivial manner. Third, Discropt 
is aware of running times given as inputs by requiring search heuristics adapt 
themselves efficiently to different running times. Running-time awareness may 
be essential many cases; consider the problem vertex coloring, in which the time 
alloted to solve an instance can depend greatly on the intended application: in 
register allocation for compiler optimization [2,3], solutions are expected within 
a few seconds or less, whereas in frequency channel assignment [26], the eco- 
nomic significance of having a good solution may make it desirable to spend 
more time on optimizing. This is similar to real-time optimizers [4,15,16], which 
are formulated in the form of intelligent search strategies. 

This paper is organized as follows. Discropt ’s architecture is briefly discussed 
and compared to similar frameworks such as HotFrame, EasyLocal-l— 1-, and iOpt 
[27] in Section 2. Recent improvements and an experimental comparison between 
the current and previous versions, together an experimental constrast between 
Discropt and iOpt [28] are discussed in Section 3. Discropt ’s important features 
are discussed next. They are: (1) time-sensitive search heuristics in Section 4, 

(2) the evaluation of the cost of moving between solutions and minimization of 
bottleneck functions in Section 5, and (3) systematic combination of cost and 
feasibility in Section 6. 

2 Overview of Discropt 

Discropt is a framework based on the local search paradigm, in which the process 
of minimizing an optimization problem is modeled as the search for the best 
solution in a solution space. This process selects an initial solution arbitrarily, 
and iterates the following procedure: from any solution, a neighboring solution is 
generated and evaluated. If its objective cost is acceptable, it replaces the older 
solution, and the search proceeds until a stopping criterion is met. This abstract 
process can completely constructed by four components: 

— Solution type. Discropt supports three primitive types permutation, subset, 
and set partition. Take the traveling saleman problem as an example. The 
solution type is most appropriately a permutation (of vertices) . On the other 
hand, in the vertex cover problem the solution type should be represented as 
a subset (of vertices), and in vertex coloring problem it should be a partition 
(of vertices into colors). 

— Neighborhood operator. This component generates randomly neighboring 
solutions of a given solution. Such generation is needed in order to traverse 
the solution space. Each solution type must have at least one operator defined 
specifically for it. In Discropt , the neighborhood operator for a permutation 
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s generates a random permutation that differs from s in exactly two indices; 
the operator for a subset s (viewing as a bit-vector of O’s and I’s) generates 
a random subset that differs from s in exactly one bit; and the operator for 
a partition s generates a random partition that differs from s in terms of 
part-membership in exactly one element. 

— Search. The search component decides whether or not to replace a solution 
with its randomly generated neighboring solution. It also needs to decide 
when to stop searching based on the current progress and the amount of 
running time left. Discropt supports two main search methods and a hybrid 
strategy combing the two methods. The two main methods include variants 
of the simulated annealing heuristic and an incremental greedy construction. 

— Objective function. This component evaluates the cost of each solution and 
the cost of moving from one solution to another. 

Although Discropt is designed in such a way users can modify each com- 
ponent in ways they see best for their applications, it also provides built-in 
primitives for each of these components to minimize users’ effort in creating 
an operational system for many combinatorial problems. Many existing local- 
search frameworks [27] share Discropt ’s objective of providing a flexible and 
broadly-applicable platform by realizing the local-search model in an object ori- 
ented setting. Discropt interprets the polymorphic characters of the local-search 
paradigm by implementing dynamically-bound search algorithms (through C-|— I- 
virtual functions) and statictally-bound solution types and neighborhood oper- 
ators (through C-I--I- templates). This design choice is similar to that of Easy- 
Local-|--|- [11] and different from that of HotFrame [7]. It allows both efficiency 
by defining a solution type statically using templates and flexibility of dynami- 
cally choosing and mixing search methods by invoking appropriate virtual search 
functions. In addition to the general local-search architecture, Discropt is simi- 
lar to Hotframe in having an explicit evaluation of the cost of moving from one 
solution to another. This function in many cases improves the efficiency of cost 
evaluation by a linear factor (see Section 5) . 

While these systems are all applicable to a wide range of problems, their 
effectiveness lies in how much effort each user puts in tailoring the components 
suitably to his or her problem. Thus, their differences in terms of built-in features 
may make one framework more appropriate for certain needs than the others. In 
this aspect, Discropt is different from EasyLocal and HotFrame in our empha- 
sis on the combinatorial structure of intended problems; we support primitive 
combinatorial solution types (permutation, subset, and set partition) and specif- 
ically designed neighborhood operators for each type. Although lopt [28] does 
support two solution types, vector of variables and set of sequences, we think 
that many combinatorial problems are more suitably represented as one of the 
three primitive types supported by Discropt . Another different aspect is that 
Discropt uses two fundamentally different search methods: simulated annealing 
and incremental greedy construction, together with a hybrid strategy combin- 
ing these two heuristics. Early experimental data [23] showed that there were 
problems for which simulated annealing clearly outperformed incremental greedy 
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construction, and vice versa. This suggests an inclusion of these two as primitive 
search methods and a development of a hybrid strategy to take advantage of 
them both. Further, Discropt fully realizes incremental greedy construction as a 
local search method and incorporates the evaluation of the cost of moving from 
one solution to a neighboring solution. This architectural aspect has a signifi- 
cant impact on the efficiency of the incremental greedy construction in the local 
search framework. Finally, Discropt is designed to be senstive to running time 
inputs. This means that both of these search methods must adjust their own 
parameters to do their best within a given amount of running time. 

3 Recent Improvements and Experimental Resnlts 

Since the preliminary version [23], we have made the following improvements: 

— Improved Annealing Schedules - We compared several cooling schedules for 
the simulated annealing heuristic to identify one that works well over all 
implemented problems. We found that our new annealing schedule, where 
the positive/negative acceptance rate is explicitly fitted to a time schedule, 
performed more robustly on the problems we studied. 

— When to Go Greedy - We generalized the greedy heuristic to work efficiently 
in a time-sensitive local-search environment, by treating the extensions 
of each partially constructed solution as its neighbors in the local-search 
context. We developed an improved combined search strategy heuristic that 
provides a more sophisticated composition of the simulated annealing and 
greedy heuristics. The idea of combining different search strategies cannot 
be employed in systems based on a single search strategy. 

— Bottleneck Optimization through Lp Norms - We provide a general optimiza- 
tion scheme for bottleneck optimization problems (such as graph bandwidth 
optimization) that improves the efficiency of local search heuristics for this 
class of problems. We show that optimizing under an appropriate Lp norm 
can provide better bottleneck solutions than optimizing under the actual 
bottleneck metric, due to faster objective function evaluation. 

— Guaranteed Feasibility through Local Improvement - Infeasible solutions (e.g 
vertex cover and graph coloring) give rise to another issue in using local 
search. Even though it is essential that a feasible solution be reported at 
the end, restricting the search exclusively to feasible solutions results in 
inefficient optimization. We present a systematic method which combines 
user-defined objective cost and feasibility functions in such a way as to 
guarantee that only feasible solutions are returned for any feasibilizable 
problem. 

Feasibility can be enforced specifically for each problem by specifying ap- 
propriate constraints [9,22]. Our method, however, is more universal; it can 
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be applied to any problem that fit the local-search framework, although fine 
tuning for each may be needed to achieve better performance. 

To illustrate the ease to construct an operational system, we implemented opti- 
mizers for the following popular combinatorial problems: 

— Permutation problems: Shortest Common Superstring, Traveling Salesman, 
and Minimum Bandwidth. 

— Subset problems: Max Cut, Max Satisfiability and Min Vertex Cover. 

— Set partition problems: Vertex Coloring and Clustering. These are imple- 
mented recently. 

Table 1 presents the historical performance of our optimizer on permutation 
and subset problems (partition problems were not yet implemented in the early 
version) over different running time inputs. Shown in Table 1 are the costs of the 
returned solutions at each running time input, on a IGHz machine with 768MB 
of RAM; lower costs are better since all problems are formulated to be minimized. 
They qualitatively show that Discropt now performs substantially better than 
the February 2002 version reported in [23] , across almost all problems and time 
scales. The lone exception is on vertex cover, where direct comparison is invalid 
because the previous system potentially returned infeasible solutions; our current 
version provably guarantees the feasibility of returned solutions. 



Table 1. Snapshot of DISCROPT’s performance in three different versions. Rnnning 
time vary from 5 to 120 seconds. Hightlighted lower costs denote better solutions. 



Running times in seconds 



Problem 




5 


10 


15 


20 


25 


30 


45 


60 


90 


120 


VertCover 


08/01 


2260 


2315 


642 


654 


644 


650 


641 


643 


598 


586 




02/02 


1100 


571 


554 


541 


546 


528 


531 


521 


521 


523 




09/03 


580 


579 


577 


574 


574 


573 


574 


574 


573 


572 


Bandwidth 


08/01 


604 


616 


611 


603 


595 


622 


620 


609 


595 


602 




02/02 


633 


620 


614 


615 


613 


609 


607 


608 


603 


602 




09/03 


984 


841 


620 


599 


537 


597 


525 


539 


515 


526 


MaxCut 


08/01 


3972 


3972 


3972 


3972 


3969 


3972 


3972 


3885 


1373 


1349 




02/02 1401 


1385 


1346 


1462 


1452 


1441 


1971 


1689 


1276 


1240 




09/03 


1625 


1206 


1150 


1155 


1130 


1117 


1110 


1098 


1088 


1077 


scs 


08/01 


3688 


1152 


1158 


1225 


1120 


1035 


1013 


994 


902 


982 




02/02 


614 


596 


570 


562 


544 


541 


515 


505 


502 


505 




09/03 


527 


500 


500 


500 


500 


500 


500 


500 


500 


500 


TSP 


08/01 


1043 


846 


730 


642 


687 


568 


551 


508 


577 


504 




02/02 


933 


633 


472 


411 


406 


397 


378 


381 


375 


374 




09/03 


377 


360 


357 


354 


345 


346 


348 


345 


343 


345 


MaxSat 


08/01 


572 


705 


735 


533 


403 


518 


344 


518 


352 


318 




02/02 


636 


359 


465 


503 


398 


374 


262 


206 


255 


236 




09/03 


462 


778 


462 


553 


276 


356 


344 


356 


155 


209 



Table 2 shows the average performance of five heuristics, which are incre- 
mental greedy construction and variants of simulated annealing based on the 
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constant-decay schedule (sal), the variation of cost at each annealing trial (sa2), 
the fit of acceptance rate to a decreasing curve (sa3), and the combined heuristic. 
The results (each is a ratio of the smallest cost known to the cost found for each 
instance) are averaged across 2 instances of all of the permutation and subset 
problems mentioned above and the Vertex Coloring problem. Our measure of 
performance is the ratio of the smallest cost known for each instance to the cost 
of the returned solution for each search heuristic and each running time. This 
number is a value between 0 and 1 for each heuristic/problem instance pair. We 
conclude that among the annealing schedules, sa3 performs the best on aver- 
age. This agrees with our hypothesis (Section 4.2) that direct manipulation of 
acceptance rate would be more effective in time-sensitive applications. More in- 
terestingly, on average the combined heuristic (which attempts to select the most 
appropriate strategy for a given problem) outperforms all individual heuristics 
of which it is composed. 



Table 2. Average performance (pj • X/ p^it ) ) ^ instances on each of 7 problems. 

Qi a lower bound for instance i; p{i, t) the performance by a heuristic at the given time 
t on instance i. Because Hi < bigger fractions are better (maximum is 1). 

Running times in seconds 

Heuristic 5 10 15 20 25 30 45 60 90 120 



greedy 0.604 0.650 0.658 0.672 

sal 0.652 0.692 0.688 0.697 

sa2 0.653 0.699 0.700 0.713 

sa3 0.701 0.729 0.756 0.762 

combined 0.633 0.757 0.784 0.805 



0.668 0.679 0.687 0.691 0.700 0.703 
0.720 0.729 0.737 0.760 0.780 0.785 
0.723 0.748 0.761 0.781 0.771 0.782 
0.762 0.766 0.795 0.804 0.821 0.826 

0.819 0.819 0.844 0.843 0.870 0.861 



Tables 3 and 4 show a qualitative comparison between iOpt’s [28] and Dis- 
cropt ’s performance on the Vertex Coloring problem. In Table 4, each tripple 
shows the number of colors of returned solutions for each heuristic Hill Climb- 
ing, Simulated Annealing (SAl), and Simulated Annealing with an annealing 
schedule in which the acceptance rate is fit to a curve (SA3). For example, the 
solutions produced by Hill Climbing on instance dsjcl25.1.col have costs 9, 8, 8, 8 
at 2,4,8,16 second running times respectively. It should be noted that in iOpt 
the solution type is implemented in as a vector of variables, and in Discropt it is 
implemented as a partition (of vertices into different colors). Further, the results 
are produced on different machines, compilers, and possibly different objective 
functions. They, however, show a qualitative comparison and constrast between 
the two systems and another look of how Discropt behaves at varying running 
times. We observe that Discropt ’s usage of memory is more frugal. 
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Table 3. Performance of iOpt on three Vertex Coloring instances on an 866Mhz PC. 
Memory taken dnring each run is maximally about 30MB of RAM. 

Problem Hill Climbing Simul. Ann. Tabu Search 
colors time colors time colors time 



dsjcl25.1.col 

dsjcl25.5.col 

dsjcl25.9.col 



7 2.37 

21 70.18 

46 492.81 



6 166.06 
20 334.75 
46 347.40 



5 44.00 

18 557.00 

44 2904.75 



Table 4. Performance of Discropt on three Vertex Coloring instances on an 2Ghz PC. 
Memory taken during each run is maximally about 1.3MB of RAM. 

number of colors for HC, SAl, SA3 
Problem 2 seconds 4 seconds 8 seconds 16 seconds 



dsjcl25.1.col 9,7,7 8,7,7 8,6,7 8,6,7 

dsjcl25.5.col 24,20,21 24,20,20 23,20,20 22,20,20 

dsjcl25.9.col 52,47,45 52,46,45 52,46,45 52,45,44 



4 Time-Sensitive Search Heuristics 

Real-time search strategies have been studied in the field of Artificial Intelli- 
gence [15,16], and non-local-search combinatorial optimization [4]. Existing local- 
search time-sensitive heuristics in the literature have generally been qualitative 
in the sense that there is a set of parameters whose chosen values translate into a 
qualitative longer or shorter running time. Lam and Delosme [17,18,19] proposed 
an annealing schedule for the simulated annealing heuristic in which how long 
a search runs depends qualitatively on A, the upper-bound of expected fitness 
of adjacent annealing trials. Hu, Kahng, and Tsao [13] suggested a variant of 
the threshold accepting heuristic based on the so-called old bachelor acceptance 
criteria, in which the trial length and hence running time is qualitatively con- 
trolled by users. A general theoretical model for evaluating time-sensitive search 
heuristics is discussed in [24]. 

Discropt ’s search heuristics are based on the three main methods: (1) a gen- 
eral incremental greedy construction heuristic (2) a global simulated annealing 
heuristic, and (3) a combined hybrid-strategy that appropriately allocates time 
among different heuristics. The generic view of our metaheuristics is abstractly 
an iteration of the following process: start with an initial solution, generate a 
random neighboring solution, decide whether or not to move to that neighbor 
or stop according to the strategy defined by the metaheuristic. 

Discropt employes a time-sensitive policy, suggesting that each heuristic 
should adjust its parameters so that the search performs as well as possible 
within a given running time. Greedy and simulated annealing implement this 
policy by varying the size of the selection pool (greedy) and the rate of converg- 
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ing to a final temperature (simulated annealing) in order to converge to local 
minima when the given running time expires. Converging to local minima in a 
non-trivial manner is arguably performing the best within a given running time. 
Further, by varying the parameters as described, better minima are achieved at 
longer running times. In other words, performance is expectedly increasing with 
running times. 

4.1 Time- Sensitive Incremental Greedy Construction 

The incremental greedy construction heuristic (see Algorithm 1) starts from 
an empty solution and at each step selects the best element among a pool of 
candidates according to some problem-specific criteria. The size of each pool 
of candidate is determined by how much time remains and how much time was 
spent on average per item in the last selection. Discropt treats incremental greedy 
construction as a local-search technique by supporting random generation of ex- 
tensions of partial solutions. Thus, the process of incrementally extending partial 
solutions to larger ones is analogous to the process of moving from one solution 
to another. Since incremental greedy construction is viewed as a local-search 
heuristic, we can take advantage of strategies applied to local-search heuristics, 
including combining it with other local-search heuristics (see Section 4.3), and 
optimizing bottle-neck problems efficiently (see Section 5). 

Our notion of incremental construction is similar to but more general than 
the Brelaz [1] graph coloring heuristic, and is related to the generalized greedy 
heuristic of Feo and Resende [6]. Greedy techniques have also been studied in 
intelligent search strategies [15]. 



Algorithm 1 Time-sensitive, local-search, incremental greedy construction 
Let t be the given running time input. 

Let C be the pool of all possible elements, 
s = 0. 

while G ^ 0 do 

Use t to determine Gi C C be a random subset of G with i elements. 

Use t to determine Pi <Z P, P is the set of all possible positions of s. 

Based on s, select the best e € Ci and p € Pi. 
if s U e is feasible then 
Insert e in s at position p 
G = G-e. 
end if 

Estimate the remaining time and update t. 

end while 



4.2 Time-Sensitive Simulated Annealing 

In simulated annealing, moving from a solution to a neighboring solution is de- 
termined with probability exp{—^P), where is the difference in cost between 
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the two solutions and is the current temperature. A final temperature, 7/, is 
derived by statistical sampling in such a way that it is the temperature at which 
neighborhood movement is strictly descendant; at Tf, its behavior is identical 
to hill climbing. An annealing schedule is characterized by the manner in which 
temperature is reduced. This reduction must be carried out so that Tf is the 
temperature at which time is about to expire, and each manner of reducing 
temperature periodically yields a different annealing schedule. We explore three 
classes of schedule in an attempt to identify which works best in a time-sensitive 
setting: 

— Traditional Annealing Schedules - These simple schedules reduce the 
temperature by a constant fraction at regular intervals so that the final 
temperature 7/ is reached as time expires. 

— Variance-Based Schedules - These schedules reduce temperature by taking 
into account the variance of cost at each annealing trial (i.e. the period 
during which Tfc is kept constant). The idea is that if the costs during 
each trial varies substantially (high variance), then Tk is reduced slowly, 
and vice versa. This schedule is similar to that of Huang, Romeo, and 
Sangiovanni-Vincetelli [14], with an additional effort to make sure 7/ is 
reached at the end. 

— Forced-Trajectory Schedules - This class of schedules appears to be novel. 
Schedules that are not aware of running times indirectly control the rate of 
accepting solutions by manipulating temperature. In a time-sensitive con- 
text, we suspect that this indirect mechanism may not be robust enough. 
These schedules forcefully fit the acceptance rate (i.e. the ratio of upward to 
downward movements) to a monotonically decreasing curve. The rate of ac- 
cepting solutions is high initially, say 0.75, but decreases to 0 as time expires. 
Figure 1 shows examples of such curves with different rates. 

Experimentally, we found that our forced-trajectory schedule outperformed 
the other two classes of schedule, averaging over instances of Vertex Coloring and 
all of the permutation and subset problems mentioned before; see experimental 
results in Section 3. 



4.3 Combination of Time-Sensitive Heuristics 

Greedy and local search strategies have been previously combined together for 
specific applications [20,21,25]. We found experimentally in [23] that simulated 
annealing outperformed incremental greedy construction on a number of prob- 
lems, and vice versa, which seemed to be in accordance with the no-free-luch 
theorems [29,30]. Specifically, the greedy heuristic was found to be superior min- 
imizing the shortest common superstring problem, whereas simulated annealing 
excelled in minimizing the bandwidth reduction problem. These results empiri- 
cally agree with the Wolpert and Macready no-free-lunch theorems [29,30]. When 
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rate(r,a,t) = exp(ln(1+r) * (x/t)'^) - 1 




Fig. 1. Three different forced-trajectory annealing schedules. 



no search strategy dominates the rest or when it is unknown if one strategy is 
should be selected, it makes sense to take advantage of them both by combining 
them strategically. Our hybrid strategy can be outlined as follows: 

1. Predict longer-run performance based on short-run performance. Given short 
running times to each heuristic, we pick the seemingly dominant heuristic 
at the given running time. Which search heuristic is dominant is roughly 
estimated as follows: 

~ Hill-climb dominates if its short-run result is better than those of both 
greedy and simulated annealing. 

— Otherwise, greedy dominates if its short-run performance is best. 

— Otherwise, simulated annealing dominates. 

2. Combine greedy and simulated annealing by running simulated annealing 
on a local optimum obtained by greedy whenever no heuristic dominates the 
others. 

As expected, we found empirically this hybrid metaheuristic outperformed 
each individual one, averaging over all studied problems. See Section 3. 

5 Optimization of Bottleneck Problems 

Discropt requires a function to evaluate the cost of moving from one solu- 
tion to a neighbor be implemented in addition to an objective function. This 
implementation allows the evaluation of a solution’s cost by evaluating the 
cost of moving to it from a neighbor with known cost. In many cases when 
the neighborhood structure is refined enough so that evaluating the difference 
in cost between solution is actually simpler than evaluating the cost of each 
solution, this indirect evaluation of cost improves efficiency by a linear fac- 
tor. An example is the TSP problem, whose objective function is defined as: 
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/(tt) = rf(c^(„),c^(i)) + X)r=i^ ^(’^(ci))7’'(ci+i))- If two neighboring solutions 
(which are permutations) differ in only a constant number of positions (e.g. 
2), the difference in cost can be evaluated in 0(1) steps, instead of 0(n) steps 
required to evaluate directly the cost of each solution. 

Unfortunately, for bottleneck problems (which have the form max{---}), 
evaluating the difference in cost among neighboring solutions can be as ex- 
pensive as evaluating the cost of each solution; there is no gain! An example 
is the minimum bandwidth reduction problem, which is defined as /(tt) = 
max(„ „)g£; |7 t(m) — 7t(?;)|, where tt is a permutation of the vertices. Changing 
a permutation a little can result in a drastic change in the value of the maxi- 
mum span, thus essentially forcing total re-evaluation of the solution cost. This 
inefficiency is a result of the function max being not-so-continuous over the so- 
lution space. As a remedy to this inefficiency, we propose a generic method that 
is applicable to any bottleneck problem: instead of optimizing the bottleneck 
problem, optimizing a more continuous approximated function. Since we have 

n 

max{a;i, • • • , a;„} = lim (> x^)p 

p — >-00 • ^ 

i—1 

instead of optimizing a max{- • •}, we optimize a ^ {•••}. It can be shown, simi- 
larly to the case of TSP, that evaluating the difference in cost of two neighboring 
solutions when the objective function is a I® much more efficient than 

evaluating directly the cost of each solution. In case of minimum bandwidth re- 
duction, the new function to be minimized is (X)(u Table 5 

shows a comparison of minimizing the bottleneck function to minimizing p-^’s 
for various p. As shown, minimizing any p-X) yields much better results due to 
efficient cost evaluation. 

Other non-local-search algorithms have benefited from optimizing an Lp func- 
tion instead of the originally stated Lq. These include Csirik, et. al. [5], who 
obtained several approximation results by optimizing an L 2 function for bin 
packing, an NP-hard Li problem [10]. Similarly, Gonzalez [12] obtained a 2- 
approximation to the NP-hard L 2 clustering problem by optimizing a reformu- 
lated Loo version. 



Table 5. Simulated Annealing on Bandwidth Reduction with different Lp’s. 



Actual bandwidths (max{ci|) at different running times 



optimized metric 



Loo = max{ci} 
L2 



U 

Le = 
Ls 



E< 



Lio 



5s 


10s 


15s 


20s 


25s 


30s 


45s 


60s 


90s 


120s 


860 


806 


817 


743 


747 


735 


704 


675 


651 


623 


525 


542 


552 


568 


551 


504 


514 


502 


507 


506 


349 


322 


336 


339 


328 


316 


313 


320 


308 


302 


310 290 288 276 275 290 278 


278 267 


268 


302 


317 


293 


283 


318 


295 


276 


282 


267 


313 


333 


283 


299 


296 


333 


352 


314 


288 


368 


252 




An Improved Time-Sensitive Metaheuristic Framework 



443 



6 Separation of Cost and Feasibility 

Discropt uses a systematic strategy to combine cost and feasibility, given that 
they have been defined separately as two different functions to be simultaneously 
optimized. Problems such as the graph coloring contain infeasible solutions (in- 
valid colorings), which are unacceptable as returned solutions. One resolution to 
this issue would be an incorporation of a penalty mechanism in the definition of 
the objective funtion. For example, penalize an invalid coloring proportionally 
to the number of violated edges. Any such method must ensure that returned 
solutions are feasible. In Discropt , cost and feasibility can optionally be defined 
as two separate functions, and the objective function to be minimized is de- 
fined in terms of these two functions. This approach guarantees two things: (1) 
returned solutions are local optima with respect to the new objective function 
and (2) local optima are provably feasible. To accomplish this, we keep track of 
the current maximum rate of change of cost with respect to feasibility, and a 
function e{t) that allows a qualitative emphasis be placed on cost initially and 
feasibility eventually, by maintaining the rate of e(t) — >■ 0. 

A problem with cost function c and feasibility function / is called feasibilizable 
if any infeasible solution has a “more” feasible neighboring solution, i.e. one with 
a lower /-value. For example, graph coloring under the swap neighborhood is 
feasibilizable, because we can always assign an offending vertex a new color, thus 
guaranteeing the existence of a more feasible neighbor of an infeasible solution. 

Proposition 1. Let P be a feasibilizable problem with cost function c and fea- 
sibility function f. Define the linear objective function g 

c, /) = c(A) -k {S{t) -k e(t)) • /(A) 

that evaluates the objective function of a solution A by combining the given 
cost and feasibility functions, using the dynamically updated S and e which are 
functions of time and defined as 



6{t) = max 



\c{A)-c{B)\ 

\f{A)-f{B)\ 



lim e(t) = 0 



for all solutions A and B which are neighbors encountered during the search. 
Then, any solution U that is locally optimal with respect to g is feasible. 

Proof. If U is not feasible, there is a neighbor V of U such that f{U) > f{V) 
and g{U) = c{U) + {6{t) + e{t)) ■ f{U) < c(P) + {S{V) + e{V)) ■ f{V) = g{V), 
since U is locally optimal with respect to g. This implies 



0 < {5{t) + e{t)) ■ (/([/) - f{V)) < c(F) - c{U) 

which implies 0 < 6{t) -k e{t) < as t — >■ oo (i.e. after a number of 

steps). This means 6{t) < contradicting the definition of 5{t). 

□ 
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Abstract. Even codes are Huffman based prefix codes with the addi- 
tional property of being able to detect the occurrence of an odd number 
of 1-bit errors in the message. They have been defined motivated by a 
problem posed by Hamming in 1980. Even codes have been studied for 
the case in which the symbols have uniform probabilities. In the present 
work, we consider the general situation of arbitrary probabilities. We 
describe an exact algorithm for constructing an optimal even code. The 
algorithm has complexity 0(n®logn), where n is the number of sym- 
bols. Further we describe an heuristics for constructing a nearly optimal 
even code, which requires O(nlogn) time. The cost of an even code 
constructed by the heuristics is at most 50% higher than the cost of a 
Huffman code, for the same probabilities. That is, less than 50% higher 
than the cost of the corresponding optimal even code. However, com- 
puter experiments have shown that, for practical purposes, this value 
seems to be much less: at most 5%, for n large enough. This corresponds 
to the overhead in the size of the encoded message, for having the ability 
to detect an odd number of 1-bit errors. 



1 Introduction 

Huffman codes [4] form one of the most traditional methods of coding. One of the 
important aspects of these codes is the possibility of handling encodings of vari- 
able sizes. A great number of extensions an variations of the classical Huffman 
codes have been described through! the time. For instance, Faller [1], Gallager 
[2], Knuth [6] and Milidiu, Laber and Pessoa [10] adressed adaptative methods 
for the construction of Huffman trees. Huffman trees with minimum height were 
described by Schwartz [12]. The consctruction of Huffman type trees with length 
constraints was considered by Turpin and Moffat [13], Larmore and Hirschberg 

* Partially supported by Conselho Nacional de Desenvolvimento Cientffico e Tec- 
nologico and Fundagao de Amparo a Pesquisa do Estado do Rio de Janeiro. 
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Table 1. Example of a Hamming- Huffman Code. 



Symbol 


Encoding 


a 


000 


b 


0110 


c 


1010 


d 


1100 


e 


nil 



[7] and Milidiu and Laber [8,9]. On the other hand, Hamming formulated algo- 
rithms for the construction of error detecting codes [3]. Further, Hamming [3] 
posed the problem of describing an algorithm that would combine advantages 
of Huffman codes with the noise protection of Hamming codes. The idea is to 
define a prefix code in which the encoding would contain redundancies that 
would allow the detection of certain kinds of errors. This is equivalent to forbid 
some encodings which, when present in the reception, would signal an error. 
Such a code is a Hamming-Huffman code and its representing binary tree, a 
Hamming-Huffman tree. In a Huffman tree, all leaves correspond to encodings. 
In a Hamming-Huffman tree, there are encoding leaves and error leaves. Hit- 
ting an error leaf in the decoding process indicates the existence of an error. 
The problem posed by Hamming is to detect the occurrence of an error of one 
bit, as ilustrated in the following example given by Hamming [3], p.76. Table 1 
shows the symbols and their corresponding encodings. Figure 1 depicts the cor- 
responding Hamming-Huffman tree. Error leaves are represented by black nodes. 
An error of one bit in the above encodings would lead to an error leaf. 




Motivated by the above problem, we have recently proposed [11] a special 
prefix code, called even code, which has the property of detecting the occur- 
rence of any odd number of 1-bit errors in the message. In [11], the study was 
restricted to codes corresponding to symbols having uniform probabilities. In 
the present paper, we consider the general situation of arbitrary probabilities. 
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First, we describe an exact algorithm for constructing an optimal even code, for 
a given set of symbols, each one with a given probability. The algorithm employs 
dynamic programming and its complexity is O(n^logn), where n is the number 
of symbols. Next, we propose a simple heuristics for approximating an optimal 
code, based on Huffman’s algorithm. The time required for computing an even 
code using the heuristics is O(nlogn). We show that the cost of an even code 
constructed by the heuristics is at most 50% higher than the cost of a Huffman 
code for the same probabilities. That is, less than 50% higher than the corre- 
sponding optimal even code. However, for practical purposes, this value seems 
to be much less. In fact, we have performed several computer experiments, ob- 
taining values always less than 5%, except for small n. This corresponds to the 
overhead in the size of the encoded message, for having the ability to detect an 
odd number of 1-bit errors. 

The plan of the paper is as follows. In Section 2 we describe the Exact 
Algorithm for constructing an optimal even code. The heuristics is formulated 
in Section 3. In Section 4 we present comparisons between the code obtained by 
the heuristics and a corresponding Huffman code for the same probabilities. The 
comparisons are both analytical and experimental. 

The following definitions are of interest. 

Let S = {si, . . . , s„} be a set of elements, called symbols. Each Sj G S has an 
associated probability fi. Throught the paper, we assume ji < /i+i. An encoding 
Ci for a symbol Sj G S is a finite sequence of O’s and I’s, associated to s^. Each 0 
and 1 is a hit of e^. The parity of e* is the parity of the quantity of I’s contained 
in ei- A subsequence of Ci starting from its first bit is a prefix of e^. The set of 
encodings for all symbols of S is a code C for S. A code in wich every encoding 
does not coincide with a prefix of any other encoding is a prefix code. 

A message M is a sequence of symbols. The encoded message of M is the 
corresponding sequence of encodings. The parity of an encoded message is the 
number of I’s contained in it. 

A binary tree is a rooted tree T in which every node z, other than the root, is 
labelled left or right in such a way that any two siblings have different labels. Say 
that T is trivial when it consists of a single node. A binary forest is a set of binary 
trees. A path of T is a sequence of nodes Z\, . . . ,Zt, such that Zq is the parent of 
Zq+i. The value t — 1 is the size of the path, whereas all Zi are descendants of 
Zi . If Zi is the root then Z\, . . . , Zj is a root path and, in addition, if Zt is a leaf, 
then z\, . . . , Zt is a root-leaf path of T. The depth of a node is the size of the root 
path to it. For a node z of T, T{z) denotes the subtree ofT rooted at z, that is, 
the binary tree whose root is z and containing all descendants of z in T. The left 
subtree of z is the subtree T{z'), where z' is the left child of z. Similarly, define 
the right subtree of z. The left and right subtrees of the root of T are denoted 
by Tl and T^, respectively. A strictly binary tree is one in which every node is 
a leaf or has two children. A full binary tree is a strictly binary tree in which all 
root-leaf paths have the same size. The edges of T leading to left children are 
labelled 0, whereas those leading to right children are labelled 1. The parity of 
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a node z is the parity of the quantity of I’s among the edges forming the root 
path to z. A node is even or odd, according to its parity, respectively. 

A (binary tree) representation of a code C is a binary tree T such that there 
exists a one-to-one correspondence between encodings G C and root-leaf paths 
Pi of T in such a way that is precisely the sequence of labels, 0 or 1, of the 
edges forming pi. A code admits a binary tree representation if and only if it is 
a prefix code. Let di be the depth of the leaf of T associated to the symbol Sj. 
Define the cost as the sum c{T) = fi-di- Hence, the cost of a trivial tree is 
0. An optimal code (tree) is one with the least cost. A full representation tree of 
C is a binary tree T* obtained from the representation tree T of C, by adding a 
new leaf as the second child of every node having exactly one child. The original 
leaves of T are the encoding leaves, whereas the newly introduced leaves, are the 
error leaves. Clearly, in case of Huffman trees, there are no error leaves. 




An even (odd) code is a prefix code in which all encodings are even (odd). 
Similarly, an even (odd) tree is a tree representation of an even (odd) code. 
Examples of even trees for up to three symbols appear in Figure 2, while Figure 
3 depicts an optimal even tree for 11 symbols of uniform probabilities. 

It is simple to conclude that even codes detect the occurrence of an odd 
number of 1-bit errors in a message as follows. We know that all encodings are 
even, so the encoded message is also even. By introducing an odd number of 
errors, the encoded message becomes odd. Since the encodings are even, the 
latter implies that in the full tree representation of the code, an error leaf would 
be hit during the decodification process, or otherwise the process terminates at 
some odd node of the tree. It should be noted that odd codes do not have this 
property. For example, if we have a code C = {1,01} and a message 01, if the 
first bit is changed, resulting 11, the message would be wrongly decoded without 
signaling error. 
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2 Exact Algorithm 

In this section, we describe an exact algorithm for constructing an optimal even 
tree for symbols with arbitrary probabilities. 

Let S = {si,... ,s„} be a set of symbols, each Si having probability ft 
satisfying fi < f^+i. For m < n, denote = {si, . . . , s,„}. 

Our aim is to find an even code C for S, having minimum cost. In fact, we 
propose a solution for a slightly more general problem. 

A parity forest F for Sm is a set of p even trees and q odd trees, and such 
that their leaves correspond to the symbols of Sm, for 0 < p,q < m, with p 
or g yf 0. Define the cost of F as the sum of the costs of its trees. Say that F 
is (m,p, g)-optimal when its cost is the least among all forests for having p 
even trees and q odd trees. Denote by c{m,p,q) the cost of an (m,p, g)-optimal 
forest. In terms of this notation, the solution of our problem is c(n, 1, 0). 

First, define the funtion 



* \ 0, otherwise 

The following theorem describes the computation of c(m, p, q) . 

Theorem 1. Let m,p,q be integers such that 0 < m < n, m > p > 0 and 
m> q>l. Let dmax = [log + 1. Then: 

(1) if m < p + q then c{m,p,q) = Am-p 

(2) if m > p + q and p = 0 then c{m, p, q) = c{m, q, q) + Am 

(3) if m > p + q and p yf 0, then c{m,p, q) is equal to 

min{c(m - l,p - 1, g), min {c(m - 1, (p + g)2‘^ - 1, (p + g).2‘^) + d.Am}} 

l<d<d., 



,ax 










A Huffman-Based Error Detecting Code 451 



Proof: By induction, we show that cases (l)-(3) correctly compute c{m,p,q), 

for 0 < m < n, m > p > 0 and m > q > 1. When m = 0, case (1) implies that 
c(0,p,q) = 0, which is correct since there are no symbols. For m > 0, let F be 
an (m,p, g)-optimal forest for Sm- Consider the alternatives. 

(!) m < p + q: In this case, the p even trees of F contain the p symbols of 
highest probabilities, respectively. In addition, when m > p, the remaining m—p 
symbols are assigned to the leaves of m — p odd trees, respectively, each of these 
trees consisting of exactly one edge. Then c{m,p,q) = Am-p- 

(2) m> p + q and p = 0: Then F consists of q odd trees, all of them empty 
and non trivial. We know that the q left subtrees of the trees of F are odd trees, 
while the q right subtrees are even trees. Furthermore, the children of the roots 
of F are roots of an (m, q, g)-optimal forest. Hence c(m, 0, q) = c{m, q, q) + Am- 

(3) TO > p + q and p yf 0: Apply the following decomposition. Let d be the 
minimum depth of a leaf of F. Clearly, the subset of nodes of depth < d induces a 
full binary forest in F. Because F is optimal, the leaf containing Sm has depth d. 
If d = 0, Sm is assigned to a trivial tree of F. In this situation, the forest formed 
by the remaining p — I even trees and q odd trees contain the remaining to — I 
symbols, and it must be optimal. Consequently, c{m,p,q) = c{m — l,p— f,q)- 
Consider d > 0. Let F' be the forest obtained from F by removing all nodes in 
levels less than d. Clearly, F' contains {p + q).2‘^ trees, equally divided into even 
and odd trees. We know that Sm has been assigned to a trivial tree T' of F' . 
Clearly, the forest F—T contains {p+q).2‘^ — l even trees and {p+q).2‘^ odd trees. 
Since F is (m,p, g)-optimal, F' — T must be (to — 1, (p -I- q)-2‘^ — 1, (p -I- q).2‘^)~ 
optimal. Regarding F, all nodes of F' have been shifted d levels. Therefore 
c\m,p,q) = c{m - 1, {p+q).2'^ - 1, (p -|- < 7 ). 2 '*) -|- d.A„. 

Next, we determine the interval [dmimdmax] of possible variation of d. 
Clearly, dmin = 0, when the forest contains a trivial tree. Next, we find dmax- 
The maximum depth in a (to,p, ^(-optimal forest F, such that no leaf of F has 
depth less than dmax corresponds to a forest in which the p even trees are trivial 
and the q odd ones are formed by one edge each, with possible empty trees. There 
are {p+ q).2‘^^nax nodes with depth dmax, half even and half odd. Consequently, 
(p-k < TO < 2.(p-k That is dmax - 1 < log ^ < dmax, 

implying dmax = [log + 1- 

Considering that the situations d = 0 and d yf 0 have been handled separately 
and that / is (to,p, g)-optimal, we obtain that c(to,p, q) is the minimum between 
c{m-l,p-l,q) and l,{p + q)2^^ - l,{p + q).2‘^) + d.Am}}- 

□ 

Theorem 1 leads to a dynamic programming algorithm for determining 
c{m,p, q), for all 0 < TO < n, TO > p > 0 and m > q > 1. 

Start by evaluating the function for 1 < t < n. The first cost to be 
computed is c(0,n, n), which is 0, since c(0,p, g) = 0, by (1). The parameter to 
varies increasingly, 0 < to < n. For each such to, vary p and q decreasingly, to < 
p < 0 and TO < g < 1. For each such triple m,p,q, compute c{m,p,q) applying 
(1), (2) or (3), according to their values. The computation stops when c(n, 0, 1) 
is calculated, as it is equal to our target c(n, 1,0). Observe that c{m,p,q) = 
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c{m,q,p), in general. There are 0{n^) subproblems. The evaluation of each one 
is performed in constant time, if by equations (1) or (2), or in O(logn) time 
when the evaluation is by (3). Consequently, the time complexity is O(n^logn). 
The space requirements are O(n^), since for each m it suffices to maintain the 
subproblems corresponding to m and m — 1. 

3 Heuristics 

In this section we describe two heuristics to obtain even codes. Heuristics 1 is 
very simple and is based on a slight modification of the classical Huffman algo- 
rithm [4]. Heuristics 2 simply adds some possible improvements to the previous 
one. As we shall see, those improvements allow to yield even codes very close to 
the optimal ones. 

3.1 Heuristics 1 

Given n symbols with probabilities /i,/ 2 , . . . ,/„, Heuristics 1 consists of two 
steps: 

Step 1. Run Huffman’s algorithm in order to obtain a Huffman tree Th for 
the n symbols. 

Step 2. Convert Th into an even tree ThbuI in the following way: for each 
odd leaf z corresponding to a symbol Si, create two children zl and zr such that: 

- the left child zl is an error leaf; 

- the right child zr is the new encoding leaf corresponding to Si- We call zr an 
augmented leaf. 

Observe that the overall running time of Heuristics 1 is O(nlogn), since it 
is dominated by Step 1. Step 2 can be easily done in 0{n) time. 

3.2 Heuristics 2 

Now we present three possible ways to improve the heuristics previously de- 
scribed. As we shall see, these improvements do not increase the running time 
in practice, while producing a qualitative jump in performance with respect to 
the cost of the generated code. 

Improvement I. During Step 1 (execution of Huffman’s algorithm), add 
the following test: 

Among the candidate pairs of the partial trees to be merged at the beginning 
of a new iteration, give preference to a pair of trees T\ and T 2 such that T\ is 
trivial and T 2 is not. 

In other words, the idea is to avoid merging trivial trees as much as possible. 
The reason why this strategy is employed is explained in the sequel. 

In Th, there exist two sibling leaves for each merge operation of trivial trees 
ocurring along the algorithm. Of course, one of the siblings is guaranteedly an 
odd leaf. When we force a trivial tree to be merged with a not trivial one, we min- 
imize the number of pairs of sibling leaves in Th, and thus the number of those 
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“guaranteedly odd” leaves. In many cases, this strategy lowers the additional 
cost needed to produce the even tree in Step 2. 

Let us denote by the Huffman tree obtained by Improvement I. It is 
worth remarking that this improvement does not affect the essence of Huffman’s 
algorithm, since Th^ is a plausible tree. 

Moreover, it is possible to implement Improvement I in constant time by 
keeping during the execution of Huffman’s algorithm two heaps H' and H” 
where the nodes of H' contain trivial trees and the nodes of H" the remaining 
ones. At the beginning of the algoritm, H' contains n nodes and H" is empty. 
When starting a new iteration, simply test whether the roots of H' and H” form 
a candidate pair of partial trees to be merged; if so, merge them. 

Improvement II. Change Th^ by repeatedly applying the following oper- 
ation in increasing depth order: 

If there exist two nodes z' ^z" at the same depth ofTn such that z' is an odd 
leaf and z" is an even internal node, exchange the positions of z' and z" . 

Observe each single application of the above operation decreases the number 
of odd leaves in Tn by one unit. Each time we find k odd leaves and £ even 
internal nodes at some depth i, we perform min{fc,£} changes and proceed to 
depth i+1. 

It is clear that the number of changes is bounded by the number of leaves 
of Th- Since a single change can be done by modifying a constant number of 
pointers, the overall complexity of Improvement II is 0{n). 

Denote by the Huffman tree obtained by Improvement II. Again, the 
essence of Huffman’s algorithm is not affected, since is still plausible. 

Improvement III. Apply Step 2 on Th^- Let T be the even tree obtained. 
Then redistribute the symbols among the leaves of T as follows: 

Whenever there exist two leaves Zi,Zj (of any parities) in T with depths 
di < dj, representing symbols Si,Sj with probabilities fi < fj, respectively, then 
exchange the symbols assigned to Zi and Zj. 

Observe that each single re-assignment performed above reduces the cost of 
the resulting even tree by {dj — di){fj — fi). 

The entire process can be implemented in the following way: after applying 
Step 2, let Zi be the leaf of the tree representing the symbol Si with probability 
/i, for i = 1, . . . , n. Assume that the depth of Zi is di. Run first a bucket sort on 
the values di, and then assign the leaf Zi with depth di to the symbol Sn-i+i with 
probability fn-i+i. (Recall that /i < /2 < •••/«•) The time required for this 
operation is therefore 0{n). Consequently, the overall time bound for Heuristics 
II is O(logn), with 0{n) space. 



4 Comparisons 

In this section, we first present an analytical upper bound for the cost of the 
even tree generated by Heuristics 2 with respect to the cost of the corresponding 
Huffman tree. Then we will exhibit some experimental results. 
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The terminology employed in this section is the following: given n symbols 
with probabilities /i, / 2 , • • ■ , fn, Th is the Huffman tree for these symbols; Te 
is the corresponding optimal even tree for these symbols; Tueui is the even 
tree obtained by applying Heuristics 1; and Th eu 2 is the even tree obtained by 
applying Heuristics 2. Observe that 



c{Th) < c{Te) < c{THeu2) < c{THeul)- 



4.1 An Analytical Bound 
Theorem 2. c{THeu 2 ) < 1-5 c{Th)- 

Proof: 

This bound is due to Improvements II and HI. Let rioi{k),noi{k),nei{k),nei{k) 
be the number of odd leaves, odd internal nodes, even leaves and even internal 
nodes at depth k of Th^ - Then either noi{k) = 0 or nei{k) = 0 (1). Moreover, it 
is clear that noi{k) + rioi{k) = nei{k) + nei{k) (2). 

We claim that noi{k) < nei{k). Otherwise, if rioi{k) > nei{k), then noi{k) > 0, 
which implies nei{k) = 0 from (1). But in this case noi{k) +noi{k) = nei{k) from 
(2), that is, noi{k) < nei{k), a contradiction. 

By summing up noi{k) and nei{k) for all values of A:, we conclude that the 
number of odd leaves is less than or equal to the number of even leaves in Th^ 
That is, the number of odd leaves is at most and i s less than or equal to the 
number of merge operations between two trivial trees in Huffman’s algorithm. 
Next, Step 2 puts odd leaves one level deeper, in order to convert Th^ into an 
even tree. 

Now, when applying Improvement HI, the probabilities are redistributed in 
the tree, in such a way that two leaves Zi, Zj with depths di, dj and probabilities 
fi,fj satisfy the condition fi > fj ^ di < dj. Consequently, there exists a one- 
to-one correspondence between the set of augmented leaves and a subset of the 
even leaves such that if Zi is an augmented leaf and Zj is its corresponding even 
leaf then fi > fj. Thus: 



c{THeu2) < c{Th) + 






< c{Th) + c{Th)/2 = 1.5 c{Th) □ 



4.2 Experimental Results 

The experimental results are summarized in Tables 2 to 4. The tables present the 
costs of the trees obtained by the algorithms, for several values of n. They were 
obtained from a program written in Pascal, running on a Pentium IV computer 
with 1.8 GHz and 256M RAM. 

In Tables 2 and 3 we compare c{Th), c(Te), c{THeu 2 ) and c^Tneui), for n in 
the range 64 to 1024. In the first case (Table 2) we use uniform probabilities, and 
in the second one (Table 3), arbitrary probabilities, obtained from the standard 
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Table 2. Comparisons with uniform probabilities 



n 


c{Th) 


c{Te) 


c{THeul) 


c(Tf/eu2) 


c(Te)-c(Th) 






<Th) 

% 


c(Th) 

% 


c(Th) 

% 


64 


6.00 


6.50 


6.50 


6.50 


8.33 


8.33 


8.33 


128 


7.00 


7.50 


7.50 


7.50 


7.14 


7.14 


7.14 


192 


7.67 


8.00 


8.17 


8.00 


4.35 


6.52 


4.35 


256 


8.00 


8.50 


8.50 


8.50 


6.25 


6.25 


6.25 


320 


8.40 


8.80 


8.90 


8.80 


4.76 


5.95 


4.76 


384 


8.67 


9.00 


9.17 


9.00 


3.85 


5.77 


3.85 


448 


8.86 


9.29 


9.36 


9.29 


4.84 


5.65 


4.84 


512 


9.00 


9.50 


9.50 


9.50 


5.56 


5.56 


5.56 


576 


9.22 


9.67 


9.72 


9.67 


4.82 


5.42 


4.82 


640 


9.40 


9.80 


9.90 


9.80 


4.26 


5.32 


4.26 


704 


9.55 


9.91 


10.05 


9.91 


3.81 


5.24 


3.81 


768 


9.67 


10.00 


10.17 


10.00 


3.45 


5.17 


3.45 


832 


9.77 


10.15 


10.27 


10.15 


3.94 


5.12 


3.94 


896 


9.86 


10.29 


10.36 


10.29 


4.35 


5.07 


4.35 


960 


9.93 


10.40 


10.43 


10.40 


4.70 


5.03 


4.70 


1024 


10.00 


10.50 


10.50 


10.50 


5.00 


5.00 


5.00 



Table 3. Comparisons with arbitrary probabilities 



n 


c{Th) 


c{Te) 


c{THeul) 


c(Tf/eu2) 


c(Te)-c(Th) 






c{Th) 

% 


<Th) 

% 


c(Th) 

% 


64 


5.82 


5.97 


6.38 


6.03 


2.65 


9.69 


3.66 


128 


6.76 


6.88 


7.28 


6.92 


1.86 


7.75 


2.43 


192 


7.33 


7.52 


7.84 


7.59 


2.65 


6.96 


3.54 


256 


7.75 


7.87 


8.25 


7.90 


1.61 


6.47 


1.95 


320 


8.08 


8.20 


8.57 


8.30 


1.51 


6.13 


2.71 


384 


8.34 


8.54 


8.85 


8.60 


2.48 


6.11 


3.12 


448 


8.52 


8.68 


9.04 


8.73 


1.88 


6.00 


2.40 


512 


8.72 


8.84 


9.22 


8.87 


1.34 


5.76 


1.71 


576 


8.92 


9.03 


9.39 


9.09 


1.24 


5.30 


1.98 


640 


9.10 


9.22 


9.60 


9.31 


1.35 


5.55 


2.34 


704 


9.20 


9.33 


9.69 


9.41 


1.43 


5.42 


2.37 


768 


9.33 


9.53 


9.83 


9.58 


2.05 


5.27 


2.67 


832 


9.44 


9.63 


9.93 


9.67 


1.98 


5.22 


2.41 


896 


9.57 


9.73 


10.08 


9.77 


1.70 


5.31 


2.14 


960 


9.64 


9.78 


10.14 


9.82 


1.47 


5.22 


1.87 


1024 


9.75 


9.87 


10.24 


9.89 


1.27 


5.11 


1.49 



Pascal generation routine for random numbers in the range 1 to 10000 (we found 
no significant variations changing this range) . All the probabilities were further 
normalized so that the total sum is 1. In Table 4 we compare the heuristics with 
Huffman’s algorithm for n in the range 1000 to 100000. 
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Table 4. Comparisons with arbitrary probabilities 



n 


c { Th ) 


c { THeul ) 


c { THeu 2 ) 


c(T’Ke-ul)— ) 


c(Th^^2)-<Th) 


c(Th) 

% 


<Th) 

% 


1000 


9.71 


10.22 


9.87 


5.23 


1.61 


2000 


10.72 


11.21 


10.87 


4.58 


1.45 


3000 


11.31 


11.81 


11.57 


4.46 


2.29 


4000 


11.72 


12.22 


11.87 


4.27 


1.34 


5000 


12.05 


12.55 


12.26 


4.13 


1.73 


10000 


13.04 


13.55 


13.25 


3.85 


1.56 


15000 


13.61 


14.11 


13.80 


3.65 


1.37 


20000 


14.05 


14.55 


14.25 


3.55 


1.46 


25000 


14.36 


14.86 


14.61 


3.50 


1.77 


30000 


14.62 


15.12 


14.81 


3.44 


1.26 


35000 


14.85 


15.35 


15.01 


3.38 


1.11 


40000 


15.05 


15.55 


15.25 


3.33 


1.35 


45000 


15.21 


15.71 


15.45 


3.28 


1.54 


50000 


15.36 


15.86 


15.61 


3.25 


1.65 


55000 


15.49 


15.99 


15.71 


3.23 


1.43 


60000 


15.62 


16.12 


15.80 


3.21 


1.19 


65000 


15.74 


16.24 


15.89 


3.18 


0.95 


70000 


15.85 


16.35 


16.01 


3.16 


1.03 


75000 


15.95 


16.45 


16.14 


3.14 


1.16 


80000 


16.04 


16.54 


16.25 


3.12 


1.26 


85000 


16.13 


16.63 


16.35 


3.09 


1.36 


90000 


16.21 


16.71 


16.45 


3.09 


1.45 


95000 


16.29 


16.79 


16.54 


3.06 


1.53 


100000 


16.36 


16.86 


16.61 


3.05 


1.55 



The main result observed in Table 2 is that, for uniform probabilities, Heuris- 
tics 2 equals the Exact Algorithm, while Heuristics 1 does not. The main expla- 
nation for this fact is that, when the Huffman tree is a complete binary tree, the 
improvements of Heuristics 2 apply very well. It can also be observed the small 
difference between all methods and Huffman’s algorithm, and the decrease of 
the relative costs when n increases. It can still be confirmed a theoretical result 
stated in [11]: the cost difference between the optimal even tree and the Huffman 
tree lays in the interval [1/3, 1/2], being maximum (1/2) when the number of 
symbols is n = 2^ for some integer k, and minimum (1/3) when n = 3.2^. 

Now, examine the results presented in Table 3, for arbitrary probabilities. 
First, compare data from Tables 2 and 3. We can see that all data in columns 
2 to 5 in Table 3 are smaller than the corresponding ones in Table 2, fact that 
is more related to the situation of arbitrary probabilities than to the methods. 
The relative difference between c{Te) and c(Th) decreases considerably as n 
increases. The same facts happened for Heuristics 2, suggesting that it is also 
well applied for this situation, although it does not equal the optimal solution. 
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However, for Heuristics 1, the behavior is quite different. Both the absolute value 
of the difference to c{Tu) and the relative value increased. So, we have a great 
advantage for Heuristics 2, in this situation. 

Table 4 illustrates the costs obtained for large values of n and arbitrary 
probabilities. The costs compared are c{Th), c^Tneui) and c{THeu 2 )- The main 
results obtained from Table 3 are confirmed, that is. Heuristics 2 is far better than 
Heuristics 1. Moreover, the relative differences of costs from the two heuristics 
to Huffman’s algorithm again decrease. Those differences become negligible for 
large values of n. 

Finally, from the three tables, we can confirm how loose was the upper the- 
oretical bound presented in Subsection 4.1, since all the relative differences be- 
tween the costs of the even trees obtained by Heuristics 2 and the Huffman trees 
were at most 5%, for n large enough. It seems to be interesting to search for 
tighter bounds for this situation. 
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Abstract. In this study, a lifting procedure is applied to some exist- 
ing formulations of the Diameter Constrained Minimum Spanning Tree 
Problem. This problem typically models network design applications 
where all vertices must communicate with each other at minimum cost, 
while meeting or surpassing a given quality requirement. An alternative 
formulation is also proposed for instances of the problem where the di- 
ameter of feasible spanning trees can not exceed given odd numbers. This 
formulation dominated their counterparts in this study, in terms of the 
computation time required to obtain proven optimal solutions. First ever 
computational results are presented here for complete graph instances of 
the problem. Sparse graph instances as large as those found in the liter- 
ature were solved to proven optimality for the case where diameters can 
not exceed given odd numbers. For these applications, the corresponding 
computation times are competitive with those found in the literature. 



1 Introduction 

Let G = (V,E) be a finite undirected connected graph with a set V of vertices 
and a set E of edges. Assume that a cost c^- is associated with every edge 
[i,j] € E, with i < j. Denote by T = {V, E') a spanning tree of G, with E' C E. 
For every pair of distinct vertices t,j G V, there exists a unique path Vij in T 
linking i and j. Denote by dij the number of edges in Vij and by d = max{dy : 
hj G the diameter of T. Given a positive integer 2 < D < |U| — 1, the 
Diameter Constrained Minimum Spanning Tree Problem (DCMST) is to find a 
minimum cost spanning tree T with d < D. 

DCMST has been shown to be VP-hard when D > 4 [6]. The problem typ- 
ically models network design applications where all vertices must communicate 
with each other at minimum cost, while meeting or surpassing a given quality 
requirement [7]. Additional applications are found in data compression [3] and 
distributed mutual exclusion in parallel computing [4, 1 1] . 
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DCMST formulations in the literature implicitly use a property of feasible 
diameter constrained spanning trees, pointed out by Handler [8]. Consider first 
the case where D is even. Handler noted that a central vertex i G V must exist 
in a feasible tree T, such that no other vertex of T is more than D/2 edges 
away from i. Conversely, if D is odd, a central edge e = [i,j] € E must exist 
in T, such that no vertex of T is more than {D — l)/2 edges away from the 
closest extremity of (z, j). Another feature shared by these formulations is that, 
in addition to the use of natural space variables (i.e. variables associated with 
the edges of G, for this application), the central vertex (resp. edge) property of 
T is enforced through the use of an auxiliary network flow structure. In doing 
so, connectivity of T is naturally enforced by these structures. 

The formulation proposed in [1,2] for even D relies on an artificial vertex 
to model central spanning tree vertices. For odd D, however, the corresponding 
formulation in [1,2] do not use either artificial vertices or edges. Similarly, for- 
mulations in [7], irrespective of D being odd or even, do not rely on artificial 
vertices or edges. Another distinction between formulations in [1,2] and those in 
[7] is that the former contains multicommodity network flow structures, while 
the latter contains single commodity network flow ones. As a result, tighter linear 
programming relaxations are obtained in [7], albeit at a much larger computer 
memory requirement. 

Achuthan et al. [1,2] do not present computational results for their DCMST 
formulation. Gouvea and Magnanti [7] used the Mixed Integer Programming 
(MIP) solver CPLEX 5.0 to test their formulation uniquely on fairly sparse 
graphs. 

In this paper, we introduce an alternative form of enforcing the central edge 
property for the odd D case of DCMST. The proposed model is based on the 
use of an artificial vertex. We also apply a lifting procedure to strengthen the 
formulations in [1,2]. Original formulations and lifted versions of them were 
tested, under the MIP solver CPLEX 9.0, on complete graph instances as well 
as on sparse graph ones. For the computational results obtained, lifted versions of 
the formulations invariably required significantly less computation time to prove 
optimality than their unlifted counterparts. That feature was further enhanced 
for the odd D case, with the use of the artificial vertex model. 

In Section 2, a summary of the main results for formulations in [1,2] is 
presented. In Section 3, the artificial vertex DCMST formulation for odd D 
is described. Strengthened (i.e. lifted) versions of the formulations in [1,2] are 
presented in Section 4. Computational experiments for dense and sparse graph 
instances are reported in Section 5. In these experiments, denser instances than 
previously attempted in the literature were solved to proven optimality. Con- 
cluding remarks are made in Section 6. 

2 Formulations 

Formulations in this study make use of a directed graph G' = (P, A). Graph G' 
is obtained from the original undirected graph G = (V, E), as follows. For every 
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edge e = [i,j] € E, with i < j, there exist two arcs (i,j) and (j, t) € A, with 
costs Cij = Cji- Let L = Dj2 if D is even and L = {D — l)/2, otherwise. 

The very first formulations for DCMST were proposed by Achuttan et al. [1, 
2]. Distinct formulations are presented by the authors for even D and odd D 
cases of the problem. Consider first the case where D is even and introduce an 
artificial vertex, denoted by r, into G' . Let G" = {V , A') be the resulting graph 
with V' = VU{r} and A' = Au{(r, 1), . . . , (r, |C|)}. Associate a binary variable 
Xij with every arc (i,j) G A^ and a non-negative variable Ui with every vertex 
i G V' . Binary variables Xij are used to identify a spanning tree, while variable 
Ui denotes the number of arcs in a path from r to i G V. For even D, DCMST 



is formulated as follows: 

min ^ CijXij (1) 

'y ' ^rj = 1 ( 2 ) 

j&V 

Y, = ^ VJGC (3) 

(i,j)aA' 

Ui — Uj + {L + l)xij < L y{i,j) G A' (4) 

Xij G {0,1} y{i,j)GA' (5) 

0<u,<L+l \JiGV'. (6) 



Equation (2) ensures that the artificial vertex r is connected to exactly one 
vertex in V , i.e. the central spanning tree vertex. Constraints (3) establish that 
exactly one arc must be incident to each vertex of V . Constraints (4) and (6) 
ensure that paths from the artificial vertex r to each vertex i gV have at most 
L-l-1 arcs. Constraints (5) are the integrality requirements. Edges [i,j] G E such 
that Xij = 1 or Xji = 1 in a feasible solution to (2)-(6) define a spanning tree T 
of G with diameter less than or equal to D. 

We now consider the odd D case of DCMST. Let Zij be a binary variable 
associated with each edge [i,j] G E, with i < j. Whenever Zij = 1, edge [i,j] 
is selected as the central spanning tree edge. Otherwise, Zij = 0. For D odd, 
DCMST is formulated as follows: 

min Y GjXij + Y GjZtj (7) 

Y = 1 

[id]eE 



( 8 ) 




Solving Diameter Constrained Minimnm Spanning Tree Problems 461 



E 


+ E! + E! '^3 & y 


(9) 








Ui 


— Uj (T -t- l)x^j ^ L V(f, _)) G A 


(10) 




XijG{0,l} y{i,j)GA 


(11) 




Zij G {0, 1} V[i, j] G E 


(12) 




Q <Ui < L 'ii gV. 


(13) 



Equation (8) ensures that there must be exactly one central edge. Constraints 
(9) establish that for any vertex i G V either there is an arc incident to it 
or else vertex i must be one of the extremities of the central spanning tree 
edge. Constraints (10) and (13) ensure that spanning tree paths from the closest 
extremity of the central edge to every other vertex i G V have at most L arcs. 
Constraints (11) and (12) are the integrality requirements. In a feasible solution 
to (8)-(13), the central edge together with those edges [i,j] G E such that xij = 1 
or Xji = 1 define a spanning tree T of G with diameter less than or equal to D. 



3 An Alternative Formulation for the Odd D Case 

The formulation in [1,2] for D odd selects one edge in E to be central and si- 
multaneously builds an auxiliary network flow problem around that edge. Flow 
emanating from the central edge is then controlled to enforce the diameter con- 
straint. Figure 1 (a) illustrates a solution obtained for D = 3. Notice that edge 
[p, q] plays the central edge role and that any spanning tree leaf is no more than 
L = {D — l)/2 = 1 edges away from edge [p, q\. 

An alternative formulation which uses an artificial vertex r, as for the even 
D case, is also possible here. Recall that, for D even, the artificial vertex r is 
connected to exactly one vertex of V, i.e. the central spanning tree vertex. Now, 
the artificial vertex r will be connected to exactly two vertices. Namely those 
two vertices incident on the central edge. This situation is modeled by implicitly 
enforcing selection of a central edge [p, q] G Ehy explicitly forcing artificial edges 
[p, r] and [q, r] to appear in the solution. An illustration of this scheme appears 
in Figure 1 (b). A feasible spanning tree T of G is obtained by eliminating the 
two edges incident on r and connecting their extremities through the central 
edge [p,(?]. 

The motivation behind our formulation for D odd is to highlight a structure 
that has already been well studied from a polyhedral viewpoint. In doing so, we 
expect to strengthen the overall DCMST formulation through the use of facet 
defining inequalities for that structure. 
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Consider the same notation and variables introduced in Section 2 for D odd. 
Additional variables Xri-, for every i € fo, are introduced to represent the edges 
associated with artificial vertex r. An edge [i,j] G E is selected as the central 
spanning tree edge if and only if edges [r, i] and [r, j] are also selected. This 
condition is enforced through the nonlinear equation Zij = Xri ■ Xrj or convenient 
linearizations of it. A valid formulation for DCMST when D is odd is given by: 

min ^ djXij + ^ CijZ,j (14) 



y ] ^rj — 2 (15) 

j&v 



^ x,j = 1 yj gv (16) 

(ij')eA' 





(17) 


[hj]eE 




Zij > Xri + Xrj — 1 V[i, j] G E 


(18) 


Zij<Xri 'i[i,j]GE 


(19) 


Zij < Xrj G E 


(20) 


Ui — Uj {L i^)xij ^ L G A 


(21) 


0 < Ui < L + 1 yi gV 


(22) 
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(23) 



z,,G{0,l} V[z,j]GS. (24) 

Equation (15) establishes that the artificial central vertex r is connected to 
exactly two vertices of V. Constraints (16) establish that there is exactly one 
arc incident to each vertex of V. Constraints (17) to (20) give a linearization of 
Zij = Xri ■ Xrj for every edge [i,j] G E. Finally, constraints (21) and (22) ensure 
that the paths from the artificial vertex r to each vertex i € V have at most 
L + 1 arcs. Constraints (23) and (24) are the integrality requirements. 

We now derive valid inequalities for the formulation (14)-(24). If constraint 
(15) is multiplied by variable Xri, for i G V, 



E ^ 



vrj 



= 2 ' 



(25) 



results. Bearing in mind that all variables in (25) are binary 0-1 and consequently 
Xri = Xri ■ Xri holds, it is valid to write 



X r% * *r r j — *r ri ' 



(26) 



However, since Zij = Xri ■ Xrj for every edge [i, j] G E and Xri and Xrj cannot 
simultaneously be equal to 1 if an edge \i, j] does not exist in E, valid constraints 
for (14)-(24) are 



E + E ^ ^ 

[jA^E 

Constraints (27) are redundant for formulation (14)-(24) but are not necessarily 
so for its linear programming relaxation. 

Additional valid inequalities for (14)-(24) can be found if one concentrates 
on inequalities (18)-(20) and the underlying Boolean quadric polytope [10]. 

4 Lifting 

In this section, following the work of Desrochers and Laporte [5], we lift the 
Miller-Tucker-Zemlin [9] inequalities Ui — uj + {L + l)xij < L, y{i,j) G A'. In 
doing so, strengthened versions of DCMST formulations presented in previous 
sections are obtained. The idea of lifting consists in adding a valid nonnegative 
term ajiXji to the above inequalities, transforming them into 

Ui — Uj + {L+ l)xij + ajiXji < L. (28) 

The larger is the value of aji, the larger will be the reduction in the original 
solution space. If Xji = 0, then aji may take any value. Suppose now xji = 1. 
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Then, Ui = Uj + 1 since the path from the central vertex in the even case (resp. 
from the closest extremity of the central edge in the odd case) to vertex i G V 
visits j before visiting i. Moreover, Xji = 1 implies Xij = 0 due to constraints 
(3) or (9). By substitution in (28), we obtain Ui — Uj + {L + ^)xij + ajiXji < 
L 1 + Uj — Uj + aji < L ^ aji < L — 1. To maximize the value of aji, we 
take aji = L — 1. Then, 

Ui — Uj + {L + l)xij + {L — l)xji < L (29) 

is a valid inequality for all {i,j) G A' (resp. for all (i,j) G A) for D > 2 in the 
even case (resp. for D > 3 in the odd case). 

We now derive improved generalized upper bounds for the variables Ui, for 
i G V. In the even case, there is an artificial vertex r such that Ur = 0. The 
central vertex connected to r will necessarily be the first vertex to be visited in 
any path emanating from r. Then, 

Ui < {L+ 1) — Lxri Vi G V. (30) 

Moreover, Ui < L for any vertex i G V which is not a leaf of the spanning tree. 
Then, 



Ui<{L+l)-Xij V(i, j) G A (31) 

holds. 

We now consider the odd case. If an edge [i,j] G E is the central one, then 
Zij = 1, Ui = 0, and Uj = 0. In consequence, 

Ui < L - Lzij V[i, j] G E. (32) 

Analogously to the odd case, Ui < L for any vertex i G V which is not a leaf of 
the spanning tree. Then, 

Ui < L — Xij y{i,j) G A. (33) 

Inequalities (30) and (31) define improved generalized upper bounds for the 
even D case, while inequalities (32) and (33) correspond to new generalized 
upper bounds for the odd S case. 

We now derive improved generalized lower bounds for the variables Ui, for 
i G V. In the even case, Ui > 1 > Xri for any vertex i G M. If i is not directly 
connected to the central vertex, then Xri = 0 and > 2. If these two conditions 
are taken into account simultaneously, we have the first improvement in the 
lower bounds: 

Ui > Xri + 2 ^ Xji Vz G V. (34) 

jeV:jiir 

The above condition is simpler in the odd case, where no central vertex exists: 

Ui > ^ Xji Vz G V. (35) 
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5 Computational Results 

Computational experiments were performed on a Pentium IV machine with a 2.0 
GHz clock and 512 MB of RAM memory, using MIP solver CPLEX 9.0 under 
default parameters. In these experiments, the alternative formulation proposed 
for the odd D case of DCMST was reinforced with valid inequalities (27). 

Test instances were generated as follows. For a graph with a number n = | V| 
of vertices, the uniform distribution was used to draw n points with integer coor- 
dinates in a square of sides 100 on the Euclidean plane. Vertices were associated 
to points and edge costs were taken as the truncated Euclidean distance between 
corresponding pairs of points. Sparse graph instances with m = \E\ edges were 
generated as in [7]. The minimum cost spanning star is computed first and all 
of its n — 1 edges are selected. The remaining m — n — 1 edges are taken as the 
least cost edges not already contained in the minimum cost star. In all 19 odd D 
instances and 18 even D instances were generated. For each of the two cases, 12 
complete graph instances (with up to 25 vertices) were generated. Test instance 
details are summarized in Tables 1 and 2. 

Table 1 gives numerical results for odd D instances. For each instance, the 
number of vertices, the number of edges, and the value of the diameter D are 
given. These entries are followed by the results obtained with the original formu- 
lation in [1,2] (A), the original formulation with lifting (B), and the new artificial 
central vertex formulation with lifting (C). For each formulation, the CPU time 
required to prove optimality is given in seconds together with the number of 



Table 1. Numerical results for the odd D case. 





(A) 


(B) 


(C) 


||C| \E\ D 


time (s) 


nodes 


time (s) 


nodes 


time (s) 


nodes 


10 45 


5 


0.14 


85 


0.16 


62 


0.13 


28 


10 45 


7 


0.14 


190 


0.17 


88 


0.17 


49 


10 45 


9 


0.05 


40 


0.10 


73 


0.08 


10 


15 105 


5 


40.95 


73331 


22.27 


18640 


31.22 


21814 


15 105 


7 


85.13 


163355 


25.02 


23226 


19.23 


16118 


15 105 


9 


76.14 


146948 


10.56 


9597 


7.80 


6174 


20 190 


5 


1686.17 


1753713 


272.00 


132678 


216.36 


95813 


20 190 


7 


31.16 


36168 


8.39 


3532 


5.05 


1519 


20 190 


9 


884.36 


982551 


151.5 


85204 


91.33 


46559 


25 300 


5 


- 


- 


73857.81 


24309253 


51551.80 


17235497 


25 300 


7 


24513.01 


15083987 


20160.13 


6225967 


16617.61 


5272473 


25 300 


9 


177024.22 


66213391 


9172.04 


2438057 


40183.44 


12040315 


20 50 


5 


39.78 


93283 


7.74 


9386 


5.58 


3707 


20 50 


7 


39.06 


68843 


3.16 


4908 


2.23 


1915 


20 50 


9 


70.12 


128376 


26.16 


45561 


40.28 


57603 


40 100 


5 


3596.34 


3506756 


780.74 


688912 


20.67 


13993 


40 100 


7 


12581.45 


12272065 


976.98 


849861 


207.64 


171846 


40 100 


9 


27735.56 


23763469 


4584.19 


3530233 


23359.05 


15583400 


60 150 


5 


- 


- 


215161.75 116775357 


11644.75 


5003876 
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Table 2. Numerical results for the even D case. 





(A) 


(B) 1 


||C| \E\ D 


time (s) 


nodes 


time (s) 


nodes 


10 


45 


4 


0.95 


1849 


0.77 


1300 


10 


45 


6 


0.13 


55 


0.08 


7 


10 


45 


10 


0.06 


29 


0.08 


17 


15 


105 


4 


65.8 


73024 


24.17 


26711 


15 


105 


6 


53.19 


66834 


32.53 


41882 


15 


105 


10 


38.41 


61822 


8.95 


11790 


20 


190 


4 


7462.1 


4877014 


1888.02 


1091803 


20 


190 


6 


1630.58 


1210813 


593.91 


412770 


20 


190 


10 


2729.48 


2285819 


172.81 


144382 


25 


300 


4 


- 


- 


158836.45 


64913343 


25 


300 


6 


43044.61 


17498605 


5194.90 


2119161 


25 


300 


10 


1031.36 


565737 


459.88 


205747 


20 


50 


4 


62.47 


115144 


0.64 


615 


20 


50 


6 


221.31 


446477 


10.81 


16396 


20 


50 


10 


619.52 


1443014 


74.15 


173234 


40 


100 


4 


8957.38 


8110166 


54.14 


51476 


40 


100 


6 


205940.95 


167119305 


909.95 


1012212 


40 


100 10 


- 


- 


146019.52 


155646590 



nodes visited in the branch-and-bound tree. Table 2 gives the same results for 
the even D case, except for the central vertex formulation which does not apply 
in this case. 

In spite of the considerable duality gaps associated with the formulations 
tested here, the lifted formulations we suggest are capable of solving, to proven 
optimality, sparse instances as large as those found in the literature in competi- 
tive CPU times. 

No results appear in the literature for complete graph DCMST instances. A 
possible explanation for that is the large computer memory demands required 
by the other existing DCMST formulations [1,2]. The very first computational 
results for complete graph DCMST instances are thus introduced in this study. 

From the computational results presented, it should also be noticed that our 
alternative odd D case formulation dominates their counterparts in this study 
in terms of the CPU time required to prove optimality. 

6 Conclusions 

In this study, DCMST formulations proposed in [1,2] were strengthened through 
the use of a lifting procedure. In doing so, substantial duality gap reductions 
were attained for the computational experiments carried out. Additionally, we 
also propose an artificial central vertex strategy for modeling the odd D case 
of the problem. For the computational tests carried out, the new formulation 
dominated its odd D counterparts in in terms of total CPU time required to prove 
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optimality. The same idea could also be extended to other existing formulations 
such as those presented in [7]. 

For sparse graphs instances, the strongest model proposed in this study was 
capable of solving, to proven optimality, instances as large as those previously 
solved in the literature [7]. It is worth mentioning here that the models suggested 
by Gouvea and Magnanti typically produce very small duality gaps. However, 
they are quite demanding in terms of computer memory requirements (particu- 
larly the model involving variables with four indices). In consequence, they do 
not appear adequate to directly tackling dense graph instances of the problem. 

In spite of the considerable duality gaps observed in our computational ex- 
periments, our approach was capable of solving, to proven optimality, complete 
graph instances with up to 25 vertices. These are the first results ever presented 
for dense graph DCMST instances. 

We conclude by pointing out that the alternative odd D formulation intro- 
duced here can be further strengthened with valid inequalities associated with 
the Boolean quadric polytope. 
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Abstract. The School Timetabling Problem (STP) regards the weekly 
scheduling of encounters between teachers and classes. Since this schedul- 
ing must satisfy organizational, pedagogical and personal costs, this 
problem is recognized as a very difficult combinatorial optimization prob- 
lem. This work presents a new Tabu Search (TS) heuristic for STP. Two 
different memory-based diversification strategies are presented. Compu- 
tational experiments with real world instances, in comparison with a pre- 
viously proposed TS found in literature, show that the proposed method 
produces better solutions for all instances, as well as observed increased 
speed in the production of good quality solutions. 



1 Introduction 

The School Timetabling Problem (STP) embraces the scheduling of sequential 
encounters between teachers and students so as to insure that requirements and 
constraints are satisfied. Typically, the manual solution of this problem extends 
for various days or weeks and normally produces unsatisfactory results due to the 
fact that lesson periods could be generated which are inconsistent with pedagog- 
ical needs or could even serve as impediments for certain teachers or students. 
STP is considered a NP-hard problem [5] for nearly all of its variants, justifying 
the usage of heuristic methods for its resolution. In this manner, various heuris- 
tic and metaheuristic approaches have been applied with success in the solution 
of this problem, such as: Tabu Search (TS) [12,4,10], Genetic Algorithms [13] 
and Simulated Annealing (SA) [2]. 

The application of TS to the STP is specially interesting, since this method 
is, as local search methods generally are, very well suited for the interactive 
building of timetables, a much recognized quality in timetable building systems. 
Furthermore, TS based methods often offer the best known solutions to many 
timetabling problems, when compared to other metaheuristics [3,11]. The diver- 
sification strategy is an important aspect in the design of a TS algorithm. Since 
the use of a tabu list is not enough to prevent the search process from becom- 
ing trapped in certain regions of the search space, other mechanisms have been 
proposed. In particular, for the STP, two main approaches have been used: adap- 
tive relaxation [10,4] and random restart [12]. In adaptive relaxation the costs 



C.C. Ribeiro and S.L. Martins (Eds.): WEA 2004, LNCS 3059, pp. 468—481, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




An Efficient Tabu Search Heuristic for the School Timetabling Problem 



469 



involved in the objective function are dynamically changed to bias the search 
process to newly, unvisited, regions of the search space. In random restart a new 
solution is generated and no previous information is utilized. 

This work employs a TS algorithm that uses an informed diversification strat- 
egy, which takes into account the history of the search process to bias the se- 
lection of diversification movements. Although it uses only standard TS compo- 
nents, it provides better results than more complex previous proposals [12]. 

The article is organized as follows: section 2 presents related works; section 3 
introduces the problem to be treated; section 4 presents the proposed algorithm; 
section 5 describes the computational experiments and their results; and finally, 
section 6 formulates conclusions and future research proposals. 

2 Related Works 

Although the STP is a classical combinatorial optimization problem, no widely 
accepted model is used in the literature. The reason is that the characteristics 
of the problem are highly dependent on the educational system of the country 
and the type of institution involved. As such, although the basic search problem 
is the same, variations are introduced in different works [3,4,10,12]. Described 
afterwards, the problem considered in this paper derives from [12] and consid- 
ers the timetabling problem encountered in typical Brazilian high schools. In 
[12], a GRASP-Tabu Search (GTS-II) metaheuristic was developed to tackle 
this problem. The GTS-II method incorporates a specialized improvement pro- 
cedure named “Intraclasses-Interclasses” , which uses a shortest-path graph al- 
gorithm. At first, the procedure is activated aiming to attain the feasibility 
of the constructed solution, after which, it then aims to improve the feasible 
solution. The movements made in the “Intraclasses-Interclasses” also remain 
with the tabu status for a given number of iterations. Diversification is imple- 
mented through the generation of new solutions, in the GRASP constructive 
phase. In [11] three different metaheuristics that incorporate the “Intraclasses- 
Interclasses” were proposed: Simulated Annealing, Microcanonical Optimization 
(MO) and Tabu Search. The TS proposal significantly outperformed both SA 
and MO. 



3 The Problem Considered 

The problem considered deals with the scheduling of encounters with teachers 
and classes over a weekly period. The schedule is made up of d days of the week 
with h daily periods, defining p = d x h distinct periods. There is a set T with 
t teachers that teach a set S' of s subjects to a set C of c classes, which are 
disjoint sets of students with the same curriculum. The association of teachers 
to subjects in certain classes is previously fixed and the workload is informed in 
a matrix of requirements Rtxc, where Vij indicates the number of lessons that 
teacher i shall teach for class j. Glasses are always available, and must have 
their time schedules, of size p, completely filled out, while teachers indicate a 
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set of available periods. Also, teachers may request a number of double lessons 
per class. These lessons are lessons which must be allocated in two consecutive 
periods on the same day. This way a solution to the STP problem must satisfy 
the following constraints: 

1. no class or teacher can be allocated for two lessons in the same period; 

2. teachers can only be allocated respecting their availabilities; 

3. each teacher must fulfill his/her weekly number of lessons; 

4. for pedagogical reasons no class can have more than two lesson periods with 
the same teacher per day. 

Also, there are the following desirable features that a timetable should 
present: 

1. the time schedule for each teacher should include the least number possible 
of days; 

2. double lessons requests must be satisfied whenever possible; 

3. “gaps” in the time schedule of teachers should be avoided, that is: periods 
of no activity between two lesson periods. 



3.1 Solution Representation 

A timetable is represented as a matrix Qtxpt in a such way that each row rep- 
resents the complete weekly timetable for a given teacher. As such, the value 
Qik G {0, 1, • • • , c}, indicates the class for which the teacher i is teaching during 
period k {qik G {1, • • • , c}), or if the teacher is available for allocation {qik = 0). 
The advantage of this representation is that it eliminates the possibility for the 
occurrence of conflicts in the timetable for teachers. The occurrence of conflicts 
in classes happens when in a given period k more than one teacher is allocated 
to that class. Allocations are only allowed in periods with teacher availability. 
A partial sample of a timetable with 5 teachers can be found in Figure 1, with 
value “X” indicating teacher unavailability. 



Teacher \ Period 


1 


2 


3 


4 


5 


• • • d X h 


1 


1 


0 


0 


2 


2 




2 


0 


X 


X 


0 


1 




3 


X 


X 


1 


0 


3 




4 


0 


1 


0 


1 


0 




5 


0 


0 


2 


3 


X 





Fig. 1. Fragment of generated timetable 
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3.2 Objective Function 

In order to treat STP as an optimization problem, it is necessary to define an 
objective function that determines the degree of infeasibility and satisfaction 
of requirements; that is, pretends to generate feasible solutions with a mini- 
mal number of unsatisfied requisites. Thus, a timetable Q is evaluated with the 
following objective function, which should be minimized: 

f{Q)=uJxfi{Q) + 5xf2{Q)+pxf3{Q) ( 1 ) 

where /i counts, for each period k, the number of times that more than one 
teacher teaches the same class in period k and the number of times that a class 
has no activity in k. The /2 portion measures the number of allocations that dis- 
regard the daily limits of lessons of teachers in classes (constraint 4). As such, a 
timetable can only be considered feasible if fi{Q) = f 2 {Q) = 0. The importance 
of the costs involved defines a hierarchy so that: co > S ^ p. The /s component in 
the objective function measures the satisfaction of personal requests from teach- 
ers, namely: double lessons, non-existence of “gaps” and timetable compactness, 
as follows: 



t 

fsiQ) = '^ai X bi + Pi X Vi + X (2) 

where Oj, Pi, and 7^ are weights that reflect, respectively, the relative importance 
of the number of “gaps” hi , the number of week days Vi each teacher is involved 
in any teaching activity during the same shift, and the non-negative difference 
Ci between the minimum required number of double lessons and the effective 
number of double lessons in the current agenda of teacher i. 

4 Tabu Search for the School Timetabling Problem 

Tabu Search (TS) is an iterative method for solving combinatorial optimization 
problems. It explicitly makes use of memory structures to guide a hill-descending 
heuristic to continue exploration without being confounded by the absence of 
improvement movements. This technique was independently proposed by Glover 
[6] and Hansen [8]. For a detailed description of TS, the reader is referred to [7]. 
This section presents a brief explanation of TS principles. They are followed by 
specifications of the customized TS implementation proposed in this paper. 

Starting from an initial solution x, the method systematically explores the 
neighborhood J\k(x) and selects the best admissible movement m, such that the 
application of m in the current solution x (denoted by a; © m) produces the 
new current solution x' € M{x). When no improvement movements are found, 
movements that deteriorate the cost function are also permitted. Thus, to try 
to avoid cycling, a mechanism called short-term memory is employed. The ob- 
jective of short-term memory is try to forbid movements toward already visited 
solutions, which is usually achieved by the prohibition of the last reversal move- 
ments. These movements are stored in a tabu list and remains forbidden (with 
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tabu status), for a given number of iterations, called tabu tenure. Since this strat- 
egy can be too restrictive, and in order to not disregard high quality solutions, 
movements with tabu status can be accepted if the cost of the new solution 
produced satisfy an aspiration criterion. Also, intensification and diversification 
procedures can be used. These procedures, respectively, aim to deeply investi- 
gate promising regions of the search space and to ensure that no region of the 
search space remains neglected. Following is a description of the constructive 
algorithm and the customized TS implementation proposed in this paper. 

4.1 Constructive Algorithm 

The constructive algorithm basically consists of a greedy randomized construc- 
tive procedure [9] . While in other works the option for a randomized construction 
is to allow diversification, through multiple initializations, in this case the only 
purpose is to have control of the randomization degree of initial solution. To build 
a solution, step-by-step, the principle of allocating first the most urgent lessons 
in the most appropriate periods is used. In this case, the urgency degree Oij of al- 
locating a lesson from teacher i for class j is computed considering the available 
periods Vi from teacher i, the available periods Wj from class j and the number 
of unscheduled lessons Qj of teacher i for class j, as follows: 6ij = 

Also, let L be the set of required lessons, such that > 0,VI^ G L. The 
algorithm then builds a restricted candidate list (RCL) with the most urgent 
lessons, in a such a way that RCL = {kj G L \ 9ij > 9 — (6 — 0 x a}, where 
9 = max{0ij \ i G T, j G C} and 9 = min{0^ \ i G T, j G C}. At each iteration, 
a lesson kj for allocation is randomly selected from RCL. The lecture is allo- 
cate in a free period of its corresponding teacher, attempting to maintain the 
timetable free of conflicts in classes. Allocations are made first in periods with 
the least teacher availability. The a parameter allows tuning the randomization 
degree of the algorithm, varying from the pure greedy algorithm (a = 0) to 
a completely random (a = 1) selection of the teacher and class to allocation. 
At each step, the number of unscheduled lessons and the urgency degrees are 
recomputed. The process continues till no more unscheduled lessons are found 
(i.e., C., = 0 ,VzGT,j gC). 

4.2 Tabu Search Components 

The TS procedure (Figure 2) starts from the initial timetable Q provided by the 
constructive algorithm and, at each iteration, fully explores the neighborhood 
Af{Q) to select the next movement m. The movement definition used here is the 
same as in [10], and involves the swap of two values in the timetable of a teacher 
i GT, which can be defined as the triple (i,Pi,P 2 ), such that qip^ yf Pi < P 2 
and pi,P 2 G { 1, • • • , p}. Clearly, any timetable can be reached through a sequence 
of these movements that is at most, the number of lessons in the requirement 
matrix. Once a movement m is selected, its reversal movement will be kept in the 
tabu list during the next tabuTenure{m) iterations, which is randomly selected 
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procedure TSDS(Q, div Activation, iterationsDiv, minTT, maxTT) 


1 


Q* = Q\ TabuList = 0; 


2 


noimprovementiterations = 0; iteration = 0; 


3 


initializeMovementFrequencies{); 


4 


repeat 


5 


A — 00 -, iteration + -b; 


6 


for all movement m such that (Q © m) £ Af{Q) do 


7 


penalty = 0; 


8 


if noimprovementiterations mod div Activation < iterationsDiv 
and iteration > div Activation then 


9 


penalty = compute Penaltyim)-, 


10 


end if 


11 


if if{Q) ~ f{Q ffi ™-) + P&nalty < A and (m ^ TabuList)) or 
(/(Q © m) < f{Q*)) then 


12 


bestMov = m; 


13 


= /(Q) - /(Q © m); 


14 


end if 


15 


end for 


16 


Q = Q (B bestMov; 


17 


tabuTenure{bestMov) = random{minTT,maxTT); 


18 


updateT abuList{bestM ov , iteration ) ; 


19 


computeMovementF requency {bestMov) ; 


20 


if if{Q) < f{Q*)) then 


21 


Q* = Q; noimprovementiterations = 0; 


22 


initializeM ovementFrequenciesQ ; 


23 


else 


24 


no I mprovementi terations++; 


25 


end if 


26 


until termination criterion reached; 


end TSDS; 



Fig. 2. Pseudo-code for TSDS algorithm 



(line 17) within the interval [minTT,maxTT\. Insertions and removals in tabu 
list can be made at every iteration (line 18). The aspiration criterion defined is 
that the movement will loose its tabu status if its application produces the best 
solution found so far (line 11). 

Since short-term memory is not enough to prevent the search process from 
becoming entrenched in certain regions of the search space, some diversification 
strategy is needed. In the proposed method, long-term memory is used to guide 
the diversification procedure as follows: frequency of movements involving each 
teacher and class are computed. While the diversification procedure is active, the 
selection of movements emphasizes the execution of few explored movements, 
through the incorporation of penalties (line 9) in the evaluation of movements. 
Each time a movement is done, movement frequencies will be updated (line 
19). These frequencies are zeroed each time that the best solution found so 
far is updated (line 22). Following is an explanation of how the penalties in 
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function computeP enalty (line 9) are computed. Considering that the counts 
of movements made with each teacher and class are stored in a matrix Ztxc, 
the penalty for a given movement takes into account the transition ratio e^- of 
teacher i and class j, which can be computed as follows: 



— 



Z 



(3) 



where z = ma,x{ Zij \ i G T, j G C}. Since a movement can involve two lesson 
periods, or a lesson period and a free period, the penalty ipiaia 2 associated with 
a movement in the timetable of teacher i, in periods p\ and P 2 with allocations 
= <Zipi and C 2 = gipj, respectively, considering the cost of the best solution 
found so far Q* is: 



V'j. 



(e 



X f{Q*) if ai yf 0 and 02 = 0 

X fiQ*) if ai = 0 and 02 yf 0 

1 + Ga2)/2 X f{Q*) if ai yf 0 and 02 yf 0 



Another penalty function proposed in this paper also considers the teacher 
workload to promote diversification. In this case, the objective is to favor move- 
ments involving teachers whose timetable changes would probably produce big- 
ger modifications in the solution structure. The value of the penalty function 
Tiaioa for allocations oi and 02 of teacher i is: 



V'i. 



ELi Gj/max*^i Pj) 



(4) 



The diversification strategy is applied whenever signals that regional en- 
trenchment may be in action are detected. In this case, the number of non- 
improvement iterations is evaluated before starting the diversification strategy 
(line 8). The number of non-improvement iterations necessary to start the diver- 
sification process {div Activation) and the number of iterations that the process 
will remain active {iterationsDiv) are input parameters. Movements performed 
in this phase can be viewed as influential movements [7], since these movements 
try to modify the solution structure in a influential (non-random) manner. The 
function computeP enalty (line 9) can use one of the penalty functions previ- 
ously presented. In the following sections, the implementation that considers the 
penalty function which only takes into account the frequency ratio of transitions 
will be referred as TSDS, while the implementation that uses the penalty func- 
tion which also takes into account the workload of teachers will be referred as 
TSDSTL. For comparison purposes, an implementation without the diversification 
strategy (TS), also will be considered in the next section. 



5 Computational Experiments and Discussion 

Experiments were done in the set of instances originated from [12], and the 
data referred to Brazilian high schools, with 25 lesson periods per week for each 
class, in different shifts. In Table 1 some of the characteristics of the instances 
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Table 1. Characteristics of problem instances. 



Instance Teachers Classes Total Double Sparseness 
Lessons Lessons Ratio (sr) 



1 


8 


3 


75 


21 


0.43 


2 


14 


6 


150 


29 


0.50 


3 


16 


8 


200 


4 


0.30 


4 


23 


12 


300 


66 


0.18 


5 


31 


13 


325 


71 


0.58 


6 


30 


14 


350 


63 


0.52 


7 


33 


20 


500 


84 


0.39 



can be verified, such as dimension and sparseness ratio (sr), which can be com- 
puted considering the total number of lessons {^lessons) and the total number 
of unavailable periods (#m): sr = , Lower sparseness values 

indicate more restrictive problems and likewise, more difficult resolution. 

The algorithms were coded in C-|— b. The implementation of GTS-II was the 
same presented in [12], and was implemented in C. The compiler used was GCC 
3.2.3 using flag - 02 . The experiments were performed in a micro-computer with 
an AMD Athlon XP 1533 MHz processor, 512 megabytes of RAM running the 
Linux operating system. 

The weights in the objective function were defined as in [12]: uj = 100, 6 = 30, 
p = 1, Oj = 3, /3j = 3 and 7 , = 1, Vi = 1, • • • , t. 

In the first set of experiments, the objective was to verify the average so- 
lution cost produced by each algorithm, within some time limits. The results 
(Table 2) consider the average best solution found in 20 independent executions, 
with the following time limits to instances 1, • • • ,7, respectively: {90, 280, 380, 
870, 1930, 1650, 2650}. The parameters for GTS-II and the time limits are the 
same proposed in [12]. The parameters for TSDS and its variations are: a = 0.1 
(constructive algorithm), minTT = 20, maxTT = 25, div Activation = 500 and 
iterationsDiv = 10. Best results are shown in bold. 



Table 2. Average results with fixed time limits. 



Instance 


GTS-II 


TSDSTL 


TSDS 


TS 


1 


204.80 


203.42 


203.37 


207.05 


2 


350.10 


344.84 


345.36 


349.26 


3 


455.70 


439.94 


439.05 


455.58 


4 


686.30 


669.69 


672.15 


670.92 


5 


796.30 


782.74 


780.74 


782.84 


6 


799.10 


783.38 


781.77 


787.85 


7 


1,076.20 


1,060.84 


1,059.05 


1,071.21 
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Table 3. Average costs of objective function components obtained by the constructive 
algorithm and at the end of the tabu search heuristic TSDS. 



Constructive Algorithm 


TSDS 


MQl 


/2(Q‘) #d{%d) 


#5 (%<?) 


cr 


/i(Q‘) 


/2(q*: 


) #d{%d) 


#9{%g) 


cr 


0.0 


0.5 


15.1 (71.5) 


17.2 (22.9) 


1.6 


0.0 


0.0 


2.0 (9.5) 


4.0 (5.3) 


1.2 


0.0 


0.0 


24.3 (83.8) 


24.8 (16.5) 


1.3 


0.0 


0.0 


8.2 (28.3) 


1.2 (0.8) 


1.0 


0.3 


2.5 


2.0 (50.0) 


31.2 (15.6) 


1.4 


0.0 


0.0 


0.4 (8.8) 


5.0 (2.5) 


1.1 


4.3 


0.9 


35.5 (53.8) 


21.0 (7.0) 


1.2 


0.0 


0.0 


19.9 (30.2) 


3.7 (1.2) 


1.0 


0.0 


0.2 


54.1 (76.1) 


46.4 (14.3) 


1.5 


0.0 


0.0 


13.6 (19.2) 


4.1 (1.2) 


1.1 


0.2 


0.0 


53.7 (85.2) 


53.4 (15.3) 


1.4 


0.0 


0.0 


13.5 (21.4) 


9.9 (2.8) 


1.0 


0.5 


0.2 


69.6 (82.9) 


74.1 (14.8) 


1.3 


0.0 


0.0 


24.4 (29.0) 


10.7 (2,1) 


1.0 



Instance 1 - target: 215 




Fig. 3. Empirical probability distribution of finding target value in function of time 
for instance 1 



Table 4. Time (in seconds) for 25%, 50% and 75% of runs achieve the target solution 
values. 



Instance 


25% 


GTS-II 

50% 


75% 


25% 


TSDS 

50% 


75% 


1 


7.64 


9.57 


12.15 


2.13 


3.36 


6.39 


2 


21.39 


26.57 


34.68 


9.03 


13.48 


19.71 


3 


28.57 


46.84 


85.41 


16.29 


27.66 


46.47 


4 


49.22 


92.57 


146.50 


2.65 


3.40 


5.45 


5 


47.79 


62.85 


102.20 


27.63 


37.85 


54.51 


6 


35.81 


48.00 


72.12 


25.20 


33.97 


44.38 


7 


92.41 


150.72 


287.48 


89.57 


118.82 


155.72 



As can be seen in Table 2, although only minor differences can be observed 
among the two implementations that use different penalty functions in the diver- 
sification strategy, results show that versions using the informed diversification 
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Instance 2 - target: 365 




Fig. 4. Empirical probability distribution of finding target value in function of time 
for instance 2 



strategy perform significantly better than GTS-II and TS. In order to evaluate 
the quality of the solutions obtained by the proposed method, and to verify how 
significant is the improvement of TSDS over the solution received from the con- 
structive algorithm, Table 3 presents the average costs involved in each objective 
function component, for the solution provided by the constructive algorithm and 
for the improved solution from TSDS. Columns (%d), #5 (%g) and cr are 
related to the component of the objective function, in the following way: 
(%d) indicates the unsatisfied double lessons (and the percentage of unsatisfied 
double lessons, considering the number of double lesson requests), {%g) in- 
dicates the number of “gaps” in the timetable of teachers (and the percentage 
considering the total number of lessons) and cr measures the compactness ratio 
of timetable of teachers. To compute cr, the summation of the actual number of 
days ad that each teacher must attend to some lesson in the school and the lower 
bound for this value ad are used. The ad value considers the minimum number 

of days mdi = \ — — j tj^^t each teacher i must attend some lecture in the 
school, such that od = ^l^^mdi. This way, cr = ad/ad. Values close to one 
indicate that the timetable is as compact as it can be. As can be seen in Table 
3, the solution provided by the constructive algorithm usually contains some 
type of infeasibility. These problems were always solved by the TSDS algorithm, 
in a way that no infeasible timetable was produced. Regarding the preferences 
of teachers, the timetable compactness, which has the highest weight in the /a 
component of the objective function, it can be seen that in most cases the op- 
timal value was reached (cr = 1 ). Also, small percentage values of “gaps” and 
unsatisfied double lessons were obtained. 

In another set of experiments, the objective was to verify the empirical proba- 
bility distribution of reaching a given sub-optimal target value (i.e. find a solution 
with cost at least as good as the target value) in function of time in different 
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Instance 3 - target: 480 




Instance 4 - target: 760 




Time (seconds) 

Fig. 5. Empirical probability distribution of finding target values in function of time 
for instances 3 and 4 



instances. The sub-optimal values were chosen in a way that the slowest algo- 
rithm could terminate in a reasonable amount of time. In these experiments, 
TSDS and GTS-II were evaluated and the execution times of 150 independent 
runs for each instance were computed. The experiment design follows the pro- 
posal of [1] . The results of each algorithm were plotted associating with the t-th 
smallest running time U a probability pi = {i — ^)/150, which generates points 
Zi = for z = 1, • • • , 150. As can be be seen in Figures 3 to 6 the TSDS 

heuristic achieves high probability values (> 50%) of reaching the target values 
in significantly smaller times than GTS-II. This difference is enhanced mainly 
in instance 4, which presents a very low sparseness ratio. This result may be re- 
lated to the fact that the “Intraclasses-Interclasses” procedure of GTS-II works 
with movements that use free periods, which are hard to find in this instance. 
Another analysis, taking into account all test instances, shows that at the time 






Cumulative probability Cumulative probability Cumulative probability 
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Instance 5 - target: 820 




Instance 6 - target: 825 




Instance 7 - target: 1100 




Fig. 6. Empirical probability distribution of finding target values in function of time 
for instances 5, 6 and 7 
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when 95% of TSDS runs have achieved the target value, in average, only 64% of 
GTS-II runs have achieved the target value. Considering the time when 50% of 
TSDS runs have achieved the target value, only 11%, in average, of GTS-II runs 
have achieved the target value. Table 4 presents the execution times needed by 
GTS-II and TSDS to achieve different probabilities of reaching the target values. 

6 Concluding Remarks 

This paper presented a new tabu search heuristic to solve the school timetabling 
problem. Experiments in real world instances showed that the proposed method 
outperforms significantly a previously developed hybrid tabu search algorithm, 
and it has the advantage of a simpler design. 

Contributions of this paper include the empirical verification that although 
informed diversification strategies are not commonly employed in tabu search 
implementations for the school timetabling problem, its incorporation can sig- 
nificantly improve the method robustness. The proposed method not only pro- 
duced better solutions for all test instances but also performed faster than a 
hybrid tabu search approach. 

Although the proposed method offers quite an improvement, future re- 
searches may combine the “Intraclasses-Interclasses” procedure with an informed 
diversification strategy, which could lend to even better results . 
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Abstract. Graphs can be represented symbolically by the Ordered Bi- 
nary Decision Diagram (OBDD) of their characteristic function. To solve 
problems in such implicitly given graphs, specialized symbolic algorithms 
are needed which are restricted to the use of functional operations offered 
by the OBDD data structure. In this paper, two symbolic algorithms 
for the single-source shortest-path problem with nonnegative positive 
integral edge weights are presented which represent symbolic versions 
of Dijkstra’s algorithm and the Bellman-Ford algorithm. They execute 
O (^N ■log{N B)^ resp. 0(^NM-\og{NB)^ OBDD-operations to obtain the 
shortest paths in a graph with N nodes, M edges, and maximum edge 
weight B. Despite the larger worst-case bound, the symbolic Bellman- 
Ford-approach is expected to behave much better on structured graphs 
because it is able to handle updates of node distances effectively in paral- 
lel. Hence, both algorithms have been studied in experiments on random, 
grid, and threshold graphs with different weight functions. These stud- 
ies support the assumption that the Dijkstra-approach behaves efficient 
w. r. t. space usage, while the Bellman-Ford-approach is dominant w. r. t. 
runtime. 



1 Introduction 

Algorithms on graphs G with node set V and edge set E C typically work 
on adjacency lists of size 6>(|U| -I- |A|) or on adjacency matrices of size 6>(|Up). 
These representations are called explicit. However, there are application areas in 
which problems on graphs of such large size have to be solved that an explicit 
representation on today’s computers is not possible. In the area of logic syn- 
thesis and verification, state-transition graphs with for example 10^^ nodes and 
10^® edges occur. Other applications produce graphs which are representable in 
explicit form, but for which even runtimes of efficient polynomial algorithms are 
not practicable anymore. Modeling of the WWW, street, or social networks are 
examples of this problem scenario. 

However, we expect the large graphs occurring in application areas to contain 
regularities. If we consider graphs as Boolean functions, we can represent them 
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by Ordered Binary Decision Diagrams {OBDDs) [4,5,19]. This data structure 
is well established in verification and synthesis of sequential circuits [9,11,19] 
due to its good compression of regular structures. In order to represent a graph 
G = (V, E) by an OBDD, its edge set E is considered as a characteristic Boolean 
function \e, which maps binary encodings of if’s elements to 1 and all others to 
0. This representation is called implicit or symbolic^ and is not essentially larger 
than explicit ones. Nevertheless, we hope that advantageous properties of G lead 
to small, that is sublinear OBDD-sizes. 

Having such an OBDD-representation of a graph, we are interested in solv- 
ing problems on it without extracting too much explicit information from it. 
Algorithms that are mainly restricted to the use of functional operations are 
called implicit or symbolic algorithms [19]. They are considered as heuristics to 
save time and/or space when large structured input graphs do not fit into the 
internal memory anymore. Then, we hope that each OBDD-operation processes 
many edges in parallel. The runtime of such methods depends on the number of 
executed operations as well as on the efficiency of each single one. The latter in 
turn depends on the size of the operand OBDDs. 

In general, we want heuristics to perform well on special input subsets, while 
their worst-case runtime is typically worse than for optimal algorithms. Sym- 
bolic algorithms often have proved to behave better than explicit methods on 
interesting graphs and are well established in the area of logic design and veri- 
fication. Most papers on OBDD-based algorithms prove their usability just by 
experiments on benchmark inputs from a special application area [10,11,13]. In 
less application-oriented works considering more general graph problems, mostly 
the number of OBDD-operations is bounded as a hint on the real over-all run- 
time [3,8,7,14]. Only a few of them contain analyses of the over-all runtime and 
space usage for special cases like grids [15,16,20]. 

Until now, symbolic shortest-path algorithms only existed for graph rep- 
resentations by algebraic decision diagrams (ADDs) [1], which are difficult to 
analyze and are useful only for a small number of different weight values. A 
new OBDD-based approach to the all-pairs shortest-paths problem [18] aims at 
polylogarithmic over-all runtime on graphs with very special properties. In con- 
trast to these results, the motivation of this paper’s research was to transform 
popular methods for the single-source shortest-path problem into symbolic algo- 
rithms, and to compare their performance in experiments. This has been done 
for Dijkstra’s algorithm as well as for the Bellman-Ford algorithm. 

This paper is organized as follows: Section 2 introduces the principles of sym- 
bolic graph representation by OBDDs. Section 3 presents the symbolic shortest- 
path algorithms studied in this paper. Section 4 discusses experimental results 
of the algorithms on random, grid, and threshold graphs. Finally, Sect. 5 gives 
conclusions on the work. 
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2 Symbolic Graph Representation 

We denote the class of Boolean functions /: {0, 1}” — >■ {0,1} by The tth 
character of a binary number x G {0, 1}" is denoted by Xi and |a:| := 
identifies its value. 

Consider a directed graph G = (V,E) with node set F = {wq, ■ ■ ■ , t’ 2 "-i} and 
edge set E CV^. G can be represented by a characteristic Boolean function \e G 
i? 2 n which maps pairs {x, y) G {0, 1}^” of binary node numbers of length n to 1 iff 
(f| 3 ,|, G E. We can capture more complex graph properties by adding further 
arguments to characteristic functions. An additional weight function c: E ^ 
(0, . . . , 2™ — 1} is modeled by xc G B 2 n+m which maps triples (x, y, d) to 1 iff 
(w|a;|,w|y|) G E and c{v\^\,v\y\) = |d|. 

A Boolean function / G defined on variables xq, . . . ,x„_i can be repre- 
sented by an OBDD Qf [4,5,19]. An OBDD is a directed acyclic graph consisting 
of internal nodes and sink nodes. Each internal node is labeled with a Boolean 
variable Xi, while each sink node is labeled with a Boolean constant. Each in- 
ternal node is left by two edges one labeled by 0 and the other by 1. A func- 
tion pointer p marks a special node that represents /. Moreover, a permutation 
7T G called variable order must be respected by the internal nodes’ labels on 
every path in the OBDD. 

For a given variable assignment a G {0, 1}", we compute the corresponding 
function value /(a) by traversing Qj from p to a sink labeled with /(a) while 
leaving a node Xi via its a^-edge. The size size{Gf) of Gf is measured by the 
number of its nodes. An OBDD is called complete, if every path from p to a 
sink has length n. This has not to be the case in general, because OBDDs may 
skip a variable test. We adopt the usual assumption that all OBDDs occurring in 
symbolic algorithms are minimal, since all OBDD-operations exclusively produce 
minimized diagrams, which are known to be canonical. There is an upper bound 
of (2-1- o(l))2”/ n for the OBDD-size of every / G hence, a graph’s edge set 
E CV^ has an OBDD of worst-case size 0(E^/log |E|). 

OBDD-operations. The satisfiability of / can be decided in time 0{1). The 
negation / as well as the replacement of a function variable Xj by a constant c 
(i.e., f\xi=c) is computable in time 0(size{Gf)). Whether two functions / and g 
are equivalent (i.e., f = g) can be decided in time 0{size{Gf) -l-size(t/g)). These 
operations are called cheap. Further essential operations are the binary synthesis 
f®giorf,g& Bn,® G B 2 (e.g., “A” and “V”) and the quantification {Qxi)f for 
a quantifier Q G |3,V}. In general, the result Gf®g has size O {size{G f) -size (^Gg)), 
which is also the general runtime of this operation. The computation of G(Qxi)f 
can be realized by two cheap operations and one binary synthesis in time and 
space 0{size^ (G f)) . 

Notation. The characteristic functions used for symbolic representation are 
typically defined on several subsets of Boolean variables, each representing a 
different argument. For example, a weighted graph’s function xc is defined on 
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two binary node numbers x = Xn-i ■ ■ - Xq and y = yn-i ■■■yo and a binary 
weight value d = d^-i ... do- We assume w. 1. o. g. that all arguments consist of 
the same number of n variables. Moreover, both a function xs G -®fcn defined on 
k arguments x^^\ . . . , € {0, 1}" as well as its OBDD-representation will 

be denoted by S(x^^^ , ■ ■ ■ , x^^^) in this paper. We use an interleaved variable order 
7T = . . . , x^\x^i \ . . . , x^^}_i), which enables to swap [19] arguments in 

time C>(size(axs)) (^-g-. F{x,y) ■= G{y,x)). 

The symbolic algorithms will be described in terms of functional assignments 
like “F{x) := G{x) A id(x).” The quantification (3y„_i . . .3yo) F{x,y) over the 
n bits of an argument y will be denoted by (3j/) F{x,y). Although this seems 
to be one OBDD-operation, this corresponds to 0{n) quantification operations. 
Identifiers with braced superscripts mark additional arguments of characteristic 
functions occurring only temporarily in quantified formulas (e. g., Further- 
more, the functional assignments will contain tool functions for comparisons of 
weighted sums like F{x,y,z) := (|a:| -I- jyj = jzj). These can be composed from 
multivariate threshold functions. 

Definition 1. Let / € Bkn be defined on variables . . . , G {0,1}”. 
Moreover, let W, T G Z, and w\, . . . ,Wk G {— W, . . . , Wj. / is called A:- variate 
threshold function iff it is 

\i=i 

W is called the maximum absolute weight of f. 

Besides the greater or equal comparison, the relations >, <, <, and = can 
be realized by binary syntheses of multivariate threshold functions, too. For a 
constant number k of arguments and a constant maximum absolute weight W, 
such a comparison function / G Bkn has a compact OBDD of size 0{n) [20]. 

3 Symbolic Shortest-Path Algorithms 

In this section, symbolic versions of two popular shortest-path algorithms are 
presented: Dijkstra’s algorithm [6] and the Bellman-Ford algorithm [2]. We as- 
sume that the reader is familiar with these two methods, and describe their 
symbolic versions in separate sections. Both solve the single-source shortest- 
path problem in symbolically represented directed graphs G = (V,E,s,c) with 
node set V = {uq, . . . ,^^^- 1 }, edge set if C of cardinality M, source node 
s G V, edge weight function c: E ^ INg, and B := max{c(e) | e G E}. 

The maximum path length from s to any node v is B(N — 1) =: L. Let 

n := ]’log(L-|- 1)] = 6>(log A^-l-logi?) be the number of bits necessary to encode 
one node number or distance value. The algorithms receive the input graph in 
form of two OBDDs for the characteristic functions G{x,y,d) and s(x) with 

G{x,y,d) = 1 <t4- G E] A [c(u| 2 ,| , W|j^|) = |d|] , 

s(a;) = 1 <t4> W| 2 ,| = s . 
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The output is the distance function dist: V —>■ INq U { 00 } which maps a node 
V € V to the length of a shortest path from s to v, given as an OBDD DIST{x, d) 
with 

i4/5'T(x, d) = 1 dist(u|a;|) = |(i| . 

Both algorithms maintain a temporary distance function A: E ^ INq U { 00 } 
represented by an OBDD D{x, d), which is updated until it equals dist. 

3.1 The Dijkstr a- Approach 

Dijkstra’s algorithm [6] stores a node set A CV ot nodes for which the shortest- 
path length is already known. At the beginning, it is A = {s}, Z\(s) = 0, and 
A{v) = 00 for all nodes u yf s. In each iteration, we add one node to A. Let u 
be the last node added to A. For each edge (u,v) it is checked whether A{u) + 
c(u,v) < A(v). If this is the case, we update A(v) to A(u) + c(u,v). After this 
relaxation step, we add a node w™™ € y\ A to A whose value is minimal. 

If {u G y\ A I A(v) yf 00 } = 0, the actual distances A correspond to dist and we 
terminate. If the nodes are stored in a priority heap with access time O(logN), 
this explicit algorithm needs time 0{{N + M) ■ log A). 

This approach is now transformed into a symbolic algorithm that works with 
corresponding OBDDs A{x) and D{x,d) for the characteristic functions of A 
and A. At the beginning, they are initialized to the source node: 

A{x) := s(x) , 

D{x,d) := s(x) A (|(i| = 0) . 

X™'" and dE™ are bit strings representing the node = U|a;min| lastly added 
to A with Initially, represents s and it is U|a;min| = s and 

|d““| = 0. 

Now all edges leaving have to be relaxed. We introduce three helping 
functions which will be used to update D{x,d): Function RELAX (x,d) repre- 
sents pairs (x,d) such that (w™™,U| 2 ,|) G E and -I- U|a,|) = \d\. 

Di{x,d) and D 2 {x,d) represent the two possibilities for nodes v\^\ ^ A: 1. It 
is Di{x,d) = I iff distance \d\ is the relaxed distance of not being larger 
than the actual distance A{v\x\)- 2. It is D 2 {x,d) = I iff distance \d\ is the 
actual distance A{v\^\) not being larger than the relaxed distance of Case 
1 represents the update of A{v\^\), while Case 2 represents its retention. Fi- 
nally, the new D(x,d) equals the actual D{x,d) for nodes v\x\ G A, while for 
nodes v\^\ ^ A Case 1 {Di{x,d)) or Case 2 {D 2 {x,d)) applies. This leads to the 
following symbolic formulation: 

RELAX{x,d) := {3d^^^) [C'(x™^ x, A (|d| = |d““| -k , 

Di(x, d) := RELAX{x, d) A (3d(b) [D(x, d(b) A (|d(i) | < |d|)] , 
D 2 {x, d) := D{x, d) A (3d(b) [RELAX{x,dA'>) A {\dW\ < |d|)] , 

D{x, d) := [A(x) A D{x, d)] V A(x) A [Di{x, d) V D 2 {x, d)] 
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At next, we select the new minimal node w™™. If there are several nodes with 
minimal value ^(n), we select the node v\^\ with the smallest node number |a:|. 
Hence, we need a comparison function for two node-distance-pairs denoted by 
LESS{x, d, y, d'). 

LESS{x,d,y,d') := {\d\ < |d'|) V [{d=d') A (|x| < |y|)] 



The facts on multivariate threshold functions in Sect. 2 imply that comparisons 
like LESS{x,d,y,d') and (|d| = -I- have OBDD-size 0{n). Now we 

define the selection function SEL{x,d). 



SEL{x, d) := A{x) A D{x, d) 






A(cc(i)) A D{xA),d(^)) A LESS{x^^),dAKx,d) 



The interpretation of this functional assignment is that the node-distance-pair 
(u| 2 ,|, \d\) is selected iff ^ A, A{v\^\) = |d|, and there is no other node-distance 
pair (v| 2 ,(i)|, with these properties and < \d\ or {d = d') A (|a;| < |j/|). 

If SEL(x, d) = 0, all nodes reachable from s have been added to A and we may 
terminate with output DIST{x,d) = D{x,d). Otherwise, SEL{x,d) contains 
exactly one satisfying assignment for x and d. This can be extracted in linear 
time w. r. t. size(SEL) [19]. Finally, we just need to add to A{x). 

:= 5FL-i(l) , 

A{x) := A{x) \/{x = 

Afterwards, we jump to the relaxation step. The correctness of this symbolic 
procedure follows from the correctness of Dijkstra’s algorithm, while we now 
consider the number of executed OBDD-operations. 

Theorem 1. The symbolic Dijkstra- approach computes the output OBDD 
DIST{x,d) by 0[N -log{NB)) OBDD-operations. 

Proof. All nodes reachable from s are added to A{x). That is, at most N re- 
laxation and selection iterations are executed. In each iteration, the algorithm 
performs a constant number of cheap operations, argument swaps, binary syn- 
theses, and quantifications over node or distance arguments. Each of the latter 
corresponds to 0{n) = 0(log(iVi?)) quantifications over single Boolean vari- 
ables. Altogether, 0(^N ■ \og{NB)') OBDD-operations are executed. □ 

We have also studied a parallelized symbolic version of Dijkstra’s algorithm, 
which selects not only one distance-minimal node to be handled in each iteration, 
but a maximal set of independent nodes not interfering by adjacency. Experi- 
ments showed that the parallelization could compensate the overhead caused by 
the more complex symbolic formulation only for graphs of very special structure, 
why this approach is not discussed in this work. 
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3.2 The Bellman-Ford- Approach 

In contrast to Dijkstra’s algorithm, the Bellman-Ford algorithm [2] does not 
select special edges to relax, but performs N iterations over all edges {u, v) £ E 
to check the condition A{u) + c{u,v) < A{v) and to update A{v) eventually. 
Therefore, its explicit runtime is 0{NM). In contrast to Dijkstra’s algorithm, 
Bellman-Ford is able to handle graphs with negative edge weights if they do not 
contain negative cycles. Furthermore, it is easy to parallelize, which motivated 
the development of a symbolic version: Few OBDD-operations hopefully perform 
many edge relaxations at once. 

Again, the actual distance function D{x, d) is only known for the source s at 
the beginning: 

D{x, d) := s{x) A {\d\ = 0) . 

We again need a function RELAX (x,y,d) representing the candidates for edge 
relaxation. Let RELAX{x,y,d) = 1 iff A{v\x\) + c{v\x\,v\y\) = Ml and Ml is not 
larger than the actual A{v\y\). 

RELAX{x,y,d) := (3d^^\d‘'‘^^) [D{x,d<'^^) A C{x,y,d^‘^'>) A {\d\ = M^^^ I + M^^^ D] 

A(3d(i)) [D{y,dA)) A(\dA)\ < |d|)] 

If RELAX{x,y,d) = 0, no relaxations are applicable and D{x,d) = DIST{x,d) 
represents the correct output — we may terminate. Otherwise, we use the com- 
parison function LESS{x,d,y,d') to choose the subset that is minimal w.r.t. 
distance Ml and, secondly, the node number M|: 



SEL{x, y, d) := RELAX (x, y, d) 



A (3cc(i),d(i)) [RELAXix^A^y^dA)) A LESS{xA),dA),x,d)] . 

Finally, we compute the symbolic set U{x,d) of node-distance pairs that have 
to be updated in D{x,d) because they were part of a selected relaxation: 

U{x,d) := (3x''^'>) SEL{x^^\x,d) , 



D{x,d) := U{x,d) V \ U{x, d) A D{x, d) 



In this way, the new distances of U{x,y) are taken over into D(x,d), while the 
other nodes keep their distance value. The new iteration starts with comput- 
ing RELAX (x,y,d). Again, the correctness follows from the correctness of the 
explicit Bellman-Ford algorithm. 

Theorem 2. The symbolic Bellman- Ford- approach computes the output OBDD 
DLST{x,d) by 0[NM ■\og{NB)') OBDD-operations. 

Proof. Every implementation of the Bellman-Ford-algorithm performs at most 
0{NM) edge relaxations. In each iteration, the symbolic method relaxes at least 
one edge, and executes a constant number of cheap operations, argument swaps, 
binary syntheses, and quantifications over node and distance arguments. Each 
of the latter corresponds to 0{n) = 0(log{NB)) quantifications over single 
Boolean variables. Altogether, Oi^NM ■ \og{NBf) OBDD-operations are exe- 
cuted. □ 
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3.3 Computing the Predecessor Nodes 

Besides dist, explicit shortest-path algorithms return for each node v & V a, 
predecessor node pred(u) =: u, such that there is a shortest path from s to u 
which uses the edge (u,v). Analogue, the symbolic approaches can be modified 
such that they also compute the predecessor nodes on shortest paths. 

The following method computes these just from the final DIST{x,d) and is 
independent of the considered symbolic algorithm. It uses the helping function 
which is satisfied iff dist(u|a,|) = c{v\^\,v\y\) = 

dist(w|y|) = \d^^^\, and By existential quantification over 

the distances d^^\ d^'^\ and d^^\ we obtain the function PREDS{x,y), which 
represents exactly all edges being part of some shortest path (i. e., for which v\x\ 
is a predecessor of U|j^|). 

P{x,y,d‘^^\d‘'^\d^^^) := DIST{x,d‘^^^) A C{x,y,d^‘^^) 

A DIST{y, A | + | = ^ 

PREDS{x,y) := (3d^^\d^^\d^^'>) P{x,y,d^^\d^^\d^^'>) 

If we are only interested in an arbitrary predecessor of a concrete node V|y»|, we 
may omit the computation of PREDS{x, y) by replacing argument y of P by y* 
and extracting an arbitrary satisfying variable assignment x* of P. Therefore, 
computing DIST{x, d) is the essential part of symbolic shortest-path algorithms, 
which has been analyzed by means of the experiments documented in Sect. 4. 

Remark 1. The worst-case behavior of a particular OBDD-operation executed 
by a symbolic algorithm can be obtained from the general bound (2-|-o(l))2”/n 
for the OBDD-size of any function / G together with the worst-case bounds 
for runtime and space in Sect. 2. 

Analogue to Theorem 2 in [18], it can be shown that constant width bounds of 
input OBDD C{x,y,d) and output OBDD DIST{x,d) imply a polylogarithmic 
upper bound on time and space for each operation. However, we did not want to 
restrict ourselves to such special cases and applied the Dijkstra-approach as well 
as the Bellman-Ford-approach in experiments to obtain more general results. 

4 Experimental Results 

Although the symbolic Bellman-Ford-approach has a higher worst-case bound for 
the number of OBDD-operations than the Dijkstra-approach, we hope that each 
of its iterations relaxes many edges in parallel leading to a sublinear operation 
number. On the other hand, representing symbolic sets like RELAX {x, y, d) may 
involve many little structured information causing larger OBDDs than Dijkstra. 
That is, we expect the Bellman-Ford method to need more space, while hoping 
that the smaller operation count results in less over-all runtime. 

In order to check these hypotheses, the symbolic shortest-path algorithms 
have been applied in experiments on random, grid, and threshold graphs. Be- 
cause the OBDD-size size(t/) of a symbolic algorithm’s input graph G is a natural 
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lower bound for its resource usage, we investigate experimental behaviors also 
w.r.t. these input sizes. This allows to measure performances independently of 
how well an input G is suited for OBDD-representation. 



Experiment setting. Both symbolic algorithms have been implemented^ in 
C++ using the OBDD package CUDD 2.3.1 by Fabio Somenzi^. An interleaved 
variable order with increasing bit significance has been used for the Boolean 
variables of each function argument. The experiments took place on a PC with 
Pentium 4 2GHz processor and 512 MB of main memory. The runtime has been 
measured by seconds of process time, while the space usage is given as the maxi- 
mum number of OBDD-nodes present at any time during an algorithm execution. 
The latter is of same magnitude as the over-all space usage and independent of 
the used computer system. 

4.1 Random Graphs 

Random graphs possess no regular structure and, therefore, are patho- 
logical cases for symbolic representations — they have expected OBDD-size 
0(A^^/log A^). Just for dense graphs some compression is achieved because, in- 
tuitively spoken, the OBDD stores the smaller number of missing edges instead 
of all existing ones. However, we cannot hope symbolic methods to beat explicit 
algorithms on random graphs. But even in such worst cases their runtime and 
space usage may be only linear w. r. t. the (correspondingly large) input OBDD- 
sizes, which is the best we can expect from symbolic methods in general. 

Both presented symbolic shortest-path algorithms have been tested on ran- 
dom graphs with 100, 200, 300, and 400 nodes and edge probabilities from 0.05 
to 1 in steps of 0.05 influencing the observed edge density. Node 0 served as 
source. Moreover, three edge weight functions of different regularity have been 
considered. The documented experimental results are the averages of results of 
10 independent experiments for each parameter setting. The particular results 
merely deviated from their averages. 



Constant edge weights. At first, the constant edge weight function c(e) = 
1 has been considered. This structural assumption causes a slightly sublinear 
growth of the symbolic representation’s OBDD-size w.r.t. the edge probability 
(see Fig. 1(a)). Figures 1(b) to 1(e) show the observed runtimes and space usage 
of both algorithms w. r. t. the edge probabilities, where “ParBF” identifies the 
symbolic (parallelized) Bellman-Ford-approach. 

As expected, the Dijkstra method uses less space, while Bellman-Ford has 
lower runtimes. Figures 1(f) and 2(a) integrate all runtimes resp. space usage into 
one plot w.r.t. the input graphs’ OBDD-sizes, which constitutes the Dijkstra- 
space resp. Bellman-Ford-time connection: The space usage of the Dijkstra- 
approach grows linearly with the input graph’s OBDD-size with same offset and 

^ Implementation and experiments available at http : //thef igaro . sourceforge.net/. 
^ CUDD is available at http://vlsi.colorado.edu/. 
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gradient for all considered numbers of nodes N, while this is not the case for the 
Bellman-Ford-approach. For the runtime, the situation is vice versa: Only the 
Bellman-Ford approach shows unique linear runtime-growth. 



Difference edge weights. In order to proceed to a less simple weight function 
than constant weights, difference edge weights have been considered: 

c{va,Vb) := |a — b\ mod 200 . 

The modulo-operation was used to bound the gap between maximum weights 
for the different numbers of nodes N = 100 to 400. 

This weight function can be composed of multivariate threshold functions, 
and has OBDD-size O(logA^). Accordingly, the OBDD-sizes of random graphs 
with difference weights are not essentially larger than for constant weights (see 
Fig. 2(b)). Again, Dijkstra dominates w. r. t. space usage and Bellman-Ford dom- 
inates w. r.t. runtime, while the general resource usage is higher than for con- 
stant weights (see Tables 2(c) and 2(d)). The dependence of time and space on 
the input OBDD-sizes is given by Figs. 2(f) and 3(a): While Dijkstra’s space 
still grows linearly with the same offset and gradient for all node numbers, the 
Bellman-Ford’s runtime behavior now differs for different N. 



Random edge weights. Finally, random graphs with random edge weights 
between 1 and 200 have been considered in experiments. Figure 2(e) shows 
that their OBDD-sizes grow linearly with the edge density, because the random 
weights prohibit the space savings observed for the two other weight functions. 
The runtimes w. r.t. edge probabilities and numbers of nodes N are given by 
Tables 2(c) and 2(d), while the dependence of time and space on the input 
OBDD-sizes is given by Figs. 2(f) and 3(a). The general resource usage further 
increased in comparison to difference weights. The missing structure of the in- 
puts leads to nearly the same runtime for Dijkstra and Bellman-Ford — the latter 
is not able to compensate the larger space requirements by less operations any- 
more. In contrast, the advantage of the Dijkstra-approach still remains: Its space 
usage grows linearly with the same offset and gradient for all considered edge 
probabilities p and numbers of nodes N . 

4.2 Grid and Threshold Graphs 

In contrast to random graphs, the grid and threshold graphs considered in 
this section are examples of structured inputs with logarithmic OBDD-size 
G(logA^) [16,20], whose OBDDs can be constructed efficiently. Hence, we hope 
a useful symbolic algorithm to use only polylogarithmic resources in these cases. 

Grid graphs. Both algorithms have been applied to 2"/^ x 2”/^-grid graphs, 
which are quadratic node matrices of 2” nodes {i, j), i, j G {0, . . . , 2”/^ — 1}, with 
vertical edges {{i,j), {i+ 1, j)) and horizontal edges {{i,j), {i,j + 1)). Grids of 
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Fig. 1. Experimental results on random graphs 









OBDD-Size of C(x,y,d) 



Experimental Studies of Symbolic Shortest-Path Algorithms 493 



200000 

150000 



ParBFN=100 - 
ParBFN=200 
ParBF N=300 
ParBF N=400 
)ikstraN=100 - 
)i kstra N=200 
)i kstra N=300 
3i kstra N=400 












10000 15000 20000 

OBDD-Size of C{x,y,d) 



N=100 - 
N=200 
35000 - N=300 



30000 ^ 



i 25000 
O 

® 20000 
CO 

g 15000 

CQ 

O 

10000 

5000 



25000 30000 



0.2 0.3 0.4 0.5 0.6 0.7 0.8 

Edge Probability 



(a) Random edges, constant weights. (b) Random edges, difference weights. 











ParBF N=1 00 — ^ — 




ParBF N=100 — ^ — « 


ParBF N=200 




ParBF N=200 


ParBF N=300 * 




ParBF N=300 - □ 


ParBF N-400 □ 




ParBF N=400 o 


Di 


jkstraN=100 ------- 




Dijkslra N=100 ------- 


Di 


ikstra N=200 - - o- - 




Dijkslra N=200 - <=- o ° 


Di 


ikstra N=300 * 


S; 3bUUUU 


Dijkslra N=300 . o 


Di 


ikstra N=400 


w 


Dijkstra N=400 ^ o 




.. 


□ 300000 


. ° 






CQ 

o 250000 
E 








I 200000 


B' *•* 






ffl 








5 150000 






^ • 


100000 


* 




o BOO BSotfSO 

» »» ° ° ° 


50000 


’<■' ,..«****^ 


— ii'* 


1 1 1 1 1 







0 5000 10000 15000 20000 25000 30000 35000 40000 0 5000 10000 15000 20000 25000 30000 35000 40000 



OBDD-Size of C(x,y,d) 



OBDD-Size of C(x,y,d) 



(c) Random edges, difference weights. 



(d) Random edges, difference weights. 



120000 

100000 

80000 

60000 

40000 

20000 

0 



N=100 
N=200 
N=300 
N=400 □ 




0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Edge Probability 




(e) Random edges, random weights. 



(f) Random edges, random weights. 



Fig. 2. Experimental resnlts on random graphs 








494 



D. Sawitzki 




CO 

a 

a 










OBDD-Size of C{x,y,d) 



Node Number Exponent 



(a) Random edges, random weights. 



(b) Grid and threshold graphs. 



Fig. 3. Experimental results on random, grid, and threshold graphs. 



size 2^ to 2^® with source node (0, 0) and constant edge weight 1 have been 
considered. Because these should be examples for graphs of optimal symbolic 
representation, no random weights were used in the experiments. 



Threshold graphs. Threshold graphs [12] have compact OBDDs of size 
0(log A^) if their degree sequence or construction sequence has a compact sym- 
bolic representation [17]. In particular, this is the case for graphs with nodes 
Wo, ... , vn -1 and edges 



(fa, f&) & E^a + b>T ,TgIN. 

Both symbolic shortest-path algorithms have been applied on such threshold 
graphs for N = 2", T = 2"“^, and n := 2, . . . , 12. As edge weight, the difference 
of Sect. 4.1 without modulo-operation has been chosen. 



Results. For both algorithms on grid and threshold graphs. Fig. 3(b) shows the 
dependency of space usage on the node number exponent n, where the Dijkstra- 
approach is again dominating. Nevertheless, the linear growth of all four plots 
implies logarithmic growth w.r.t. N, which is the best case for symbolic algo- 
rithms’ behavior. In general, this convenient property cannot be deduced just 
from logarithmic input OBDD-size. 

Because Dijkstra’s runtime is at least linear in the number of reachable nodes 
and the Bellman-Ford’s runtime is at least linear in the minimum number of 
edges on any s-w-path, no polylogarithmic runtime can be obtained on grid 
graphs. Despite this theoretical fact, Bellman-Ford performed very efficiently on 
grids, while Dijkstra’s runtime got very inefficient for the exponentially growing 
grid sizes (see Table 3(a)). Moreover, in experiments on grids with a number 
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Table 1. Experimental runtime results on random graphs. 
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(a) Dijkstra, random 
edges, difference weights. 
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(c) Dijkstra, random edges, 
random weights. 
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(b) ParBF, random 
edges, difference weights. 
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(d) ParBF, random edges, 
random weights. 



of 201ogiV randomly added edges, both algorithms’ runtime did not change 
essentially in comparison to unmodified grids. 

Table 3(b) shows the observed runtimes on the considered threshold graphs. 
Due to the very small runtimes of the Bellman-Ford-approach, we cannot deduce 
any assumptions about its behavior besides that its again performing much more 
efficient than the Dijkstra-approach. Both on grids and threshold graphs with 
n > 8, it was even able to beat an explicit shortest-path algorithm implemented 
in LEDA^ version 4.3. 



® Available at http://www.algorithmic-solutions.com/. 
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Table 2. Experimental runtime results on grid and threshold graphs. 
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(a) Grid 
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(b) Threshold 
graphs. 



5 Conclusions 

Two symbolic algorithms for the single-source shortest-path problem on OBDD- 
represented graphs with nonnegative integral edge weights have been presented 
which execute 0{N ■ log(fVi?)) resp. 0[NM ■ log(A^B)) OBDD-operations. Al- 
though Bellman-Ford’s worst-case bound is the larger one, this symbolically 
parallelized approach was expected to have better runtime but higher space us- 
age than the Dijkstra-approach. This was confirmed by experiments on random 
graphs with constant and difference weights as well as on grid and threshold 
graphs. Dijkstra’s space usage was always of linear magnitude w. r. t. the size of 
its input OBDDs with a relative error of less than 0.06. For the Bellman-Ford- 
approach, this property was only observed on grid and threshold graphs as well 
as for the runtime on random graphs with constant edge weights. 

Altogether, experiments both on pathological instances (random graphs) and 
structured graphs well-suited for symbolic representation (grid and threshold 
graphs) show that for each of the resources time resp. space at least one 
algorithm performs well or even asymptotically optimal w. r. t. the input 
OBDD-size. Hence, both shortest-path algorithms can be considered as useful 
symbolic methods with individual strengths. 



Acknowledgment. Thanks to Ingo Wegener for proofreading and helpful dis- 
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Abstract. The maximum diversity problem (MDP) consists of identi- 
fying optimally diverse subsets of elements from some larger collection. 
The selection of elements is based on the diversity of their characteristics, 
calculated by a function applied on their attributes. This problem be- 
longs to the class of NP-hard problems. This paper presents new GRASP 
heuristics for this problem, using different construction and local search 
procedures. Computational experiments and performance comparisons 
between GRASP heuristics from literature and the proposed heuristics 
are provided and the results are analyzed. The tests show that the new 
GRASP heuristics are quite robust and find good solutions to this prob- 
lem. 



1 Introduction 

The maximum diversity problem (MDP) [5,6,7] consists of identifying optimally 
diverse subsets of elements from some larger collection. The selection of elements 
is based on the diversity of their characteristics, calculated by a function applied 
on their attributes. The goal is to find the subset that presents the maximum 
possible diversity. There are many applications [10] that can be solved using 
the resolution of this problem, such as medical treatment, selecting jury panel, 
scheduling final exams, and VLSI design. This problem belongs to the class of 
NP-hard problems [6]. 

Glover et al. [6] presented mixed integer zero-one formulation for this prob- 
lem, that can be solved for small instances by exact methods. Bhadury et al. [3] 
developed an exact algorithm using a network flow approach for the diversity 
problem of working groups for a graduate course. 

Some heuristics are available to obtain approximate solutions. Weitz and 
Lakshminarayanan [12] developed five heuristics to find groups of students with 
the most possible diverse characteristics, such as nationality, age and gradua- 
tion level. They tested the heuristics using instances based on real data and 
implemented an exact algorithm for solving them and the heuristic LGW (Lofti- 
Gerveny- Weitz method) was considered the best for solving these instances. 

Gonstructive and destructive heuristics were presented by Glover et al. [7], 
who created instances with different size of population (maximum value was 30) 
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and showed that the proposed heuristics obtained results close (2 %) to the ones 
obtained by the exact algorithm, but much faster. 

Kochenberger and Glover [10] showed results obtained using a tabu search 
and Katayama and Naribisa [9] developed a memetic algorithm. Both report 
that computational experiments were carried out, but they did not compare the 
performance of their algorithms with exact or other heuristics procedures. 

Ghosh [5] proposed a GRASP (Greedy Randomized Adaptive Search Pro- 
cedure) that obtained good results for small instances of the problem. Andrade 
et al. [2] developed a new GRASP and showed results for instances randomly 
created with a maximum population of 250 individuals. This algorithm was able 
to find some solutions better than the ones found by the Ghosh algorithm. 

GRASP [4] is an iterative process, where each iteration consists of two phases: 
construction and local search. In the construction phase a feasible solution is 
built, and its neighborhood is explored by a local search. The result is the best 
solution found over all iterations. In Section 2 we describe three construction pro- 
cedures developed using the concept of reactive GRASP introduced by Prais and 
Ribeiro [11], and two local search strategies. In Section 3 we show computational 
results for different versions of GRASP heuristics created by the combination of 
a constructive algorithm and a local search strategy described in Section 2. Gon- 
cluding remarks are presented in Section 4. 



2 GRASP Heuristics 

The construction phase of GRASP is an iterative process where, at each iteration, 
the elements c G C that do not belong to the solution are evaluated by a greedy 
function g : C ^ 5R+, that estimates the gain of including it in the partial 
solution. They are ordered by their estimated value in a list called restricted 
candidate list (RGL) and one of them is randomly chosen and included in the 
solution. The size of the RGL is limited by a parameter a. For a maximization 
problem, only the elements whose g values are in the range [(1 — a)gmax, dmax] 
are placed in RGL. This process stops when a feasible solution is obtained. 

Prais and Ribeiro [11] proposed a new procedure called Reactive GRASP, for 
which the parameter a used in the construction phase is self adjusted for each 
iteration. For the first construction iteration, an a value is randomly selected 
from a discrete set A = {a\, . . . ,am}- Each element Oj has a probability pi 
associated and, initially, a uniform distribution is applied, thus we have Pi = 
1/m, i = 1, . . . , m. Periodically the probability distribution pi,i = 1, . . . , m is 
updated using information collected during the former iterations. The aim is to 
associate higher probabilities to values of a that lead to better solutions and 
lower ones to values of a that guide to worse solutions. 

The solutions generated by the construction phase are not guaranteed to be 
locally optimal. Usually a local search is performed to attempt to improve each 
constructed solution. It works by successively replacing the current solution by a 
better one from its neighborhood, until no more better solutions are found. Nor- 
mally, this phase demands great computational effort and execution time, so the 
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construction phase plays an important role to diminish this effort by supplying 
good starting solutions for the local search. We implemented a technique widely 
used to accomplish this task, that leads to a more greedy construction. For each 
GRASP iteration, the construction algorithm is executed X times generating X 
solutions and only the best solution is selected to be used as the initial solution 
for the local search phase. 

In the next subsections, we describe the construction and local search algo- 
rithms developed for the GRASP heuristics, using the concepts discussed in this 
section. 



2.1 Construction Phase 

Let E = {a : i G N}, N = {1, 2, . . . , n} be a population of n elements and eu, 
I G L = {1,2, ..., 1} the I values of the attributes of each element. In this paper, 
we measure the diversity between any two elements i and j by the Euclidean 

distance calculated as dij = ~ ^jkY- Let M be a subset of N and 

the overall diversity be z{M) = ^he MDP problem consists of 

maximizing the cost function z{M), subject to \M\ = m. 

We describe three construction algorithms developed to be used in GRASP 
heuristics where all of them use the techniques described before: Reactive 
GRASP and filtering of constructed solutions. 



K larger distances heuristic (KLD). This algorithm constructs an initial 
solution by randomly selecting an element from a RGL of size K at each con- 
struction iteration. The RGL is created by selecting for each element i G N, the 
K elements j G N\{i}, that exhibit larger values of dij and sum these K values 
of dij, obtaining s^. Then, we create a list of all elements i sorted in descending 
order by their Sj values and select the K first elements to compose the RGL list. 

The procedure developed to implement the reactive GRASP starts consider- 
ing mJt to be the total number of GRASP iterations. In the first block of iter- 
ations Bi = OAmJt, we evaluate four different values for K G {Ki, K 2 , K^, K 4 } 
and the evaluation is done by dividing the block into four equal intervals Ci,i = 
1, . . . , 4. We use the value Ki for all iterations belonging to interval Cj,i = j. The 
values of Ki are shown in Tab. 1, where fx = (n — m)j2. After the execution of 
the last iteration of block Bi , we evaluate the quality of the solutions obtained 
for each Ki. We calculate the mean diversity value zrrii = X)i<g<o im it 



Table 1. K values for block Bi 



i 


Ci 


K 


T 

2 

3 

4 


[1, . . . , O.lmJt] 
{O.lmJt, . . . , 0.2mJt] 
{0.2mjit, . . . , O.SmJt] 
{0.3m_it, . . . , O.lmJt] 


m + fj, — 0.2fj, 
m + fi — O.lfj, 
m + fj, + 0.1/i 
m + fj, + 0.2fj, 
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Table 2. K values for block B2 



i 


IVi 


K 


T 


[OAmJt, . . . 


, 0.64m Jt] 


Iki 


2 


(0.64mJt, . 


. . , 0.82m_it] 


lk2 


3 


(0.82mJt, . 


. . , 0.94m_it] 


Iks 


4 


(0.94mJt, . 


. . , mJt] 


Iki 



for the solutions soUq, f = 1, . . . 4; g = 1, . . . , O.lmJt obtained using each Ki. 
The values Ki are stored in a list LK ordered by their zrrii values. 

Then for the next block of iterations B 2 = 0.6mJt, we divide it into four 
intervals yi, each one with different number of iterations, and use the Ki values 
as shown in Tab. 2. In this way, the values Ki that provide better solutions are 
used in a larger number of iterations. 

At each GRASP iteration, we apply the filter technique for this heuristic by 
constructing 400 solutions and only the best solution is sent to the local search 
procedure. 

The pseudo-code, including the description of the procedure for the construc- 
tion phase using K larger Distances heuristic, is given in Fig. 1. 



procedure constr_KLD(it_Gi?AS'P, mJt, numsol, n, m) 

1. beat-cost-sol <r- 0; 

2. K - 1 ^ det-K(it_GRASP,mJt, LK,i); 

3. numsol [i] numsol [i] -I- 1; 

4. RCL ^ Build_RCL{K); 

5. for j = 1, . . . , maxsol_filter do 

6. sol -h- {}; 

7. for k = 1, . . . ,m do 

8. Randomly select an individual e* from RCL; 

9. sol sol U {e*}; 

10. RCL-i- RCL-{e*}-, 

11. end for; 

12. if (z{sol) > bestsostsol) then do 

13. solsonstr sol; 

14. bestjcostsol <— z{sol); 

15. end if 

16. end for; 

17. solsval [i,numsol[i]\ <— z{sol_constr); 

18. if {it^GRASP == 0.4m Jt) then do 

19. LK ■(— Build_LK{sol_eval); 

20. end if; 

21. return soPconstr. 



Fig. 1. Construction procedure used to implement the KLD heuristic 
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In line 1, we initialize the cost of the best solution found in the execution of 
max sol -filter iterations. The value K to be used to build the Restricted Can- 
didate List (RCL) is calculated by the procedure det-K in line 2. This procedure 
defines the value for K implementing the reactive GRASP described before. In 
line 3, the number of solutions found for a specific K is updated and, in line 4, 
the RCL is built. From line 5 to line 16, the construction procedure is executed 
max sol -filter times and only the best solution is returned to be used as an 
initial solution by the local search procedure. From line 7 to line 11, a solution is 
constructed by the random selection of an element from RCL. In lines 12 to 15, 
we update the best solution found by the construction procedure and the cost 
of the solution found using the selected K is stored in line 17. When the first 
block Bi of iterations ends, the values Ki are evaluated and put in the list LK 
sorted in descending order, in line 19. 



K larger distances heuristic-v2 (KLD-v2). This algorithm is similar to the 
previously described algorithm, the difference between them is the way that the 
Restricted Candidate List is built. In the former algorithm, the RCL is computed 
before the execution of the construction iterations and, for each iteration, the 
only modification made in the RCL is the removal of the element that is inserted 
in the solution. 

In this algorithm, the RCL is built using an adaptive procedure, where the 
process to select the first element of the constructed solution is the same as of 
the KLD heuristic, which means that an element is randomly selected from the 
RCL built as described in line 4 of Fig. 1. 

Let Me be a partial solution with c, 1 < c < m elements and i G N\Mc a 
candidate to be inserted in the next partial solution M^+i - For each i, we select 
the {K — c — 1) elements j G N\{Mc lj{f}),that present larger values of dij and 
calculate the sum of the {K — c — 1) values of dij obtaining Si. To select the 
next element to be inserted, an initial candidate list is created based on the 
greedy function gf{i) shown in (1), where the first term corresponds to the sum 
of distances from the candidate i to the elements j G Me, and the second term 
stands for the sum of distances from element i to the {K — c — 1) elements that 
are not in the solution Me and present larger distances to i. The initial candidate 
list is formed by the elements i, sorted in descending order with respect to gf{i), 
and the first K elements are selected from this list to build the RCL. 

+ s* (1) 

The Reactive GRASP and the construction filter are implemented in the same 
way as in KLD. Once this construction algorithm demands much more execution 
time than KLD algorithm, only 2 solutions, instead of 400, are generated to be 
filtered. 
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Most distant insertion heuristic (MDI). Let Me be a partial solution with 
c, {1 < c < m) elements, the partial solution Mi is obtained by randomly select- 
ing an element from all elements i G N. 

The second element m 2 is the element j, which presents the larger distance 
dij,i G Mi,j G N\Mi. To obtain Mc{c > 3) from Mc-i, the element to be 
inserted in the solution is randomly selected from a RCL. The RCL is built 
based on the function dsum{j) showed in (2), where the first term of this function 
corresponds to the sum of distances between all elements i G Mc-i- The second 
term is the sum between all elements i G Mc-i to a candidate j that is not in 
the partial solution Mc-i- 

dsum{j) = E E dyw ^ ^ dyj (2) 

2 y-\-l<w<c—l l<i;<c— 1 

An initial candidate list (ICL) is created containing the elements j G 
N\Mc-i, sorted in descending order by their dsum{j) values. The first a x n 
elements of ICL are selected to form the RCL. 

For this algorithm, the reactive GRASP is implemented in the same way 
done for the K larger distances heuristic. The first block Bi = OAmJt is di- 
vided into four intervals of the same size and four values for a G {oi, 02 , as, 0 : 4 } 
are evaluated. Table 3 shows the values of a used for each interval. The values 
ai,i = 1,...,4 are evaluated by calculating the mean diversity value zm-i = 
X)i<q<o.im_jt z{sokq) for the solutions sokq, i = 1, . . . 4; g = 1, . . . , O.lmJt ob- 
tained using each Oj. The values are stored in a list La ordered by their zrrii 
values. 



Table 3. a values for block Bi 



i 


Ci 


a 


1 


[1, . . . , O.lmJt] 


0.03 


2 


(O.lmJt, . . 


0 

to 


0.05 


3 


(0.2mJt, . . 


. , 0.3m Jt] 


0.07 


4 


(0.3m_it, . . 


. , 0.4m Jt] 


0.1 



The next block of iterations B 2 = 0.6mJt is also divided into four intervals 
Ui, each one with distinct number of iterations and, for each one, a value of a is 
associated, as shown in Tab. 4. 

We have also implemented the same procedure described above for filtering 
the constructed solutions. In this case, the number of solutions generated is n, 
so it depends on the population size of each instance. 

Figure 2 shows the construction phase procedure using the MDI heuristic. 
In line I, we initialize the value of the best solution found. The value a to 
be used to build the RCL is calculated by the procedure det^a in line 2. This 
procedure selects a based on the reactive GRASP discussed before. In line 3, the 
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Table 4. a values for block B 2 



i 


Vi 


a 


I 


[0.4m Jt, . . . , 0.64mJt] 


lai 


2 


(0.64m_it, . . . , 0.82m_it] 


ICX2 


3 


(0.82m_it, . . . , 0.94m Jt] 


las 


4 


(0.94m Jt, . . . , m-it] 


la4 



number of solutions found for a specific a is updated and in line 4, the set that 
contains the candidates to be inserted in the solution is initialized to contain 
all elements belonging to N . From line 5 to line 24, the construction procedure 
is executed max sol -filter times and only the best solution is returned to be 
used as an initial solution by the local search procedure. In line 7, the first 



procedure constrJ'iDl{it_GRASP, m_it, numsol, n, m) 

1. beat -Cost sol <— 0; 

2. a <r- det-a(it -GRASP, rri-it, La, i); 

3. numsol [i] numsol [i] + 1; 

4. N-RCL ^ N- 

5. for 7 = 1,..., max sol -filter do 

6. sol {}; 

7. Randomly select an individual mi from N\sol <— sol U {ml}; 

8. for all j G N\M\ do 

9. Compute dmij 

10. m 2 I, |dmi, = max{dm^j),j G N\Mr, 

11. sol <r- sol U {m 2 }; 

12. end for all; 

13. N-RCL^ N - M2\ 

14. for k — 3, . . . ,m do 

15. RCL ^ Build-RGL-a{N-RGL, a); 

16. Randomly select an individual e* from RCL; 

17. sol <r- sol U {e*}; 

18. N-RGL-i- N-RGL-{e*}; 

19. end for; 

20. if (z{sol) > best -Cost sol) then do 

21. sol-constr •(— sol; 

22. best-costsol <— z{sol); 

23. end if 

24. end for; 

25. sol-eval[i,numsol[i]] z(sol-constr); 

26. if {it-GRASP == 0.4m_it) then do 

27. La •(— Build-La{sol-eval); 

28. end if; 

29. return sol-constr; 



Fig. 2. Construction procedure used to implement the MDI heuristic 
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element is selected and from line 8 to line 12, we determine the second element 
of the solution. From line 14 to line 19, the insertion of the other elements is 
performed. For each iteration, in line 15, a RCL is built and, in line 16, an 
element is randomly selected from it. In line 18, we update the candidates to 
be inserted in the next iteration. In lines 20 to 23, we update the best solution 
found by the construction procedure. The cost of the best solution found using 
the selected a is stored in line 25. When the first block Bi of iterations finishes, 
the values are evaluated and put in the list La in line 27. 

2.2 Local Search Phase 

After a solution is constructed, a local search phase should be executed to at- 
tempt to improve the initial solution. In this paper, we use two different local 
search algorithms. The first one was developed by Ghosh [5] and the second one 
by us using the Variable Neighborhood Search (VNS) [8] heuristic. 



Ghosh Algorithm (GhA). The neighborhood of a solution defined by 
Ghosh [5] is the set of all solutions obtained by replacing an element in the 
solution by other that does not belong to the set associated with the solution. 
The incumbent solution Af is initialized with the solution obtained by the con- 
struction phase. For each i G M and j G N\M, the improvement due to ex- 
changing i by j, Az{i,j) = i® computed. If for all i and j, 

Az{i,j) < 0, the local search is terminated, as no exchange will lead to a better 
solution. Otherwise, the elements of the pair (i,j) that provides the maximum 
Az{i,j) are interchanged creating a new incumbent solution M and the local 
search is performed again. 

SOM Algorithm (SOMA). We have also implemented a local search 
using a VNS heuristic. In this case, we use the GhA algorithm until there is 
no more improvement in the solution. After that, we execute a local search 
based on a new neighborhood, which is defined as the set of all solutions 
obtained by replacing two elements in the solution by another two that 
are not in the solution. The incumbent solution M is initialized with the 
solution obtained by the first phase of the local search. For each (i,j) G M 
and (v,w) G N\M, the improvement due to exchanging (i,j) by (v,w), 
Z\z((i, j), (v,w)) = + d-wu - d^u - dju) is computed. If for all 

pairs (i,j) and (v,w), Az((i,j),(v,w)) < 0, as no exchange will improve the 
solution, the local search is terminated. Otherwise, the pairs (i,j) and (v,w) 
that provides the maximum Az{{i,j), {v, w)) are interchanged, a new incumbent 
solution M is created and the local search is performed again. 

We developed several GRASP heuristics combining the construction proce- 
dures with the local search strategies described above and the computational 
experiments implemented to evaluate the performance of these heuristics are 
presented in next section. 
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3 Computational Results 

We tested nine GRASP procedures that are shown in Tab. 5. 

The first GRASP procedure G1 is an implementation of the GRASP heuristic 
developed by Ghosh and the second one is a procedure that implements Ghosh 
construction heuristic but uses the new local search SOMA. G9 is the GRASP 
heuristic implemented by Andrade et al. [2]. Except for G9, which code was 
kindly provided to us by the authors, all other algorithms were implemented by 
us. 

The algorithms were implemented in G++, compiled with g++ compiler 
version 3.2.2 and were tested on a PG AMD Athlon 1.3GHz with 256 Mbytes 
of RAM. Twenty instances for the problem were created with populations of 
sizes n = 100, n = 200, n = 300, n = 400 and n = 500, and subsets of sizes 
m = 10%n, m = 20%n, m=30%n and m = 40%n. The diversities in the set 
{dij\i < j]i,j £ JV} for each set of instances that have the same population size 
were randomly selected from a uniform distribution over [0 ... 9] . 

In Tab. 6, we show the results of computing 500 iterations for each GRASP 
heuristic. The first and second columns identify two parameters of each instance: 
the size of the population and the number m of elements to be selected. Each 
procedure was executed three times and for each one we show the average value 
of the solution cost and the best value found. 

We can see that the proposed GRASP heuristics found better solutions than 
GRASP algorithms found in literature [2,5]. Algorithm G7, which implements 
the KLD-v2 for construction phase and GhA for local search, was the one that 
found better solutions for larger number of instances. 

Table 7 reports the GPU times observed for the execution of the same in- 
stances. The first and second columns identify the two parameters of each in- 
stance. For each GRASP heuristic, the average time for three executions and 
the time obtained when the best solution was found are reported. Among the 
proposed heuristics, algorithm G5 is the most efficient related to execution time. 
Heuristic G7, for which we have the best quality solutions, demands more time 



Table 5. GRASP procedures 



GRASP procedure 


Construction heuristic 


Local search heuristic 


G1 


Ghosh 


GhA 


G2 


Ghosh 


SOMA 


G3 


MDI 


GhA 


G4 


MDI 


SOMA 


G5 


KLD 


GhA 


G6 


KLD 


SOMA 


G7 


KLD-V2 


GhA 


G8 


KLD-V2 


SOMA 


G9 


Andrade 


Andrade 
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Table 6. Solutions for GRASP heuristics 
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Table 7. CPU time for GRASP heuristics (seconds) 
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than G5 but is not the worst one, showing that this algorithm works very well 
for this problem. 

We performed a deeper analysis for the results obtained for the GRASP 
heuristics Gl, G5, G6, G7 and G8, which present better solutions and/or shorter 
execution times. We selected two instances: the first one has parameters n = 200 
and m = 40, and the second one, n = 300 and m = 90. We executed each GRASP 
heuristic until a solution was found with a greater or equal cost compared to 
a target value. Two target values were used for each instance: the worst value 
obtained by these heuristics and an average of the values generated by them. 
Empirical probability distributions for the time to achieve a target value are 
plotted in Fig(s). 3 and 4. To plot the empirical distribution for each variant, 
we executed each GRASP heuristic 100 times using 100 different random seeds. 
In each execution, we measured the time to achieve a solution whose cost was 




Time (sec) 




Time (sec) 



Fig. 3. Comparison of GRASP heuristics for the instance n = 200, m = 40 with targets 
values 4442 and 4443 
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Time (sec) 




Time (sec) 



Fig. 4. Comparison of GRASP heuristics for the instance n = 300, m = 90 with targets 
values 20640 and 20693 



greater or equal to the target cost. The execution times were sorted in ascending 
order and a probability pi = {i — 0.5)/100 was associated for each time ti and 
the points Zi = were plotted for z = 1, . . . , 100 [1]. 

We compared the proposed GRASP algorithms with Ghosh algorithm (Gl) 
by evaluating the average probability that Gl presents when we have the proba- 
bility values equal to 0.9 and 1.0 for the proposed GRASP heuristics. We obtain 
these values from Fig(s). 3 and 4. For example, we can obtain the probability 
values for Gl, when we have a probability value equal to 0.9 for G5. In this 
case, we have a value of 0.12 for both target values 4442 and 4443, 0.83 for tar- 
get 20640, and 0.7 for target 20693. The average of these values is 0.44. So we 
have evaluated these average values for G5, G6, G7 and G8 and the results are 
presented in Tab. 8. 
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Table 8. Comparison of convergence of solutions 



probability 


G1-G5 


G1-G6 


G1-G7 


G1-G8 


0.9 


0.44 


0.5 


0.7 


0.75 


1.0 


0.6 


0.67 


0.91 


0.95 



We can see that although the algorithm G1 presents a good convergence to 
the target values, the proposed algorithms G5, G6, G7 and G8 were able to 
improve this convergence. 

4 Concluding Remarks 

This paper presented some versions of GRASP heuristic to solve the maximum 
diversity problem (MDP). The main goal of this work was to analyse the influence 
of the construction and local search heuristics on the performance of GRASP 
techniques. 

Experimental results show that the versions that use KLD or KLD-v2 con- 
struction algorithms and Gha or SOMA local search algorithms (G5, G6, G7 
and G8) significantly improve the average performance of the best GRASP ap- 
proaches proposed in the literature (G1 and G9). 

Our experiments also show that if the execution time is restricted (limited 
to smaller value), version G5 is a good choice since it obtains reasonable results 
faster (see Fig(s). 3 and 4). On the other hand, if the execution time is not an 
issue, versions G7 and G8 tend to produce the best solutions (see Tabs. 6 and 

7). 
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Abstract. The increasing latency between memory and processor 
speeds has made it imperative for algorithms to reduce expensive ac- 
cesses to main memory. In earlier work, we presented cache-conscious 
algorithms for sorting strings, that have been shown to be almost two 
times faster than the previous algorithms, mainly due to better usage of 
the cache. In this paper, we propose two new algorithms, Burstsort and 
MEBurstsort, for sorting large sets of integer keys. Our algorithms use a 
novel approach for sorting integers, by dynamically constructing a com- 
pact trie which is used to allocate the keys to containers. These keys are 
then sorted within the cache. The new algorithms are simple, fast and 
efficient. We compare them against the best existing algorithms using 
several collections and data sizes. Our results show that MEBurstsort is 
up to 3.5 times faster than memory-tuned quicksort for 64-bit keys and 
up to 2.5 times faster for 32-bit keys. For 32-bit keys, on 10 of the 11 
collections used, MEBurstsort was the fastest, whereas for 64-bit keys, 
it was the fastest for all collections. 



1 Introduction 

Sorting is one of the fundamental problems of computer science. Many appli- 
cations are dependent on sorting, mainly for reasons of efficiency. It is also of 
great theoretical importance: several advances in data structures and algorithmic 
analysis have come from the study of sorting algorithms [6]. 

In recent years, the speed of CPU has increased by about 60% per year, 
but the speed of access to main- memory has decreased by only 7% per year [3]. 
Thus, there is an increasing latency gap between the processor speeds and access 
to main-memory and it appears that this trend is likely to continue. To reduce 
this problem, hardware developers have introduced hierarchies of memories - 
caches - between the processor and main-memory. The closer caches are to the 
processor, the smaller, faster and more expensive they get. Caches utilise the 
locality, temporal and spatial, inherent in most programs. As programs do not 
access all code or data uniformly, having those frequently accessed items closer 
to the processor is an advantage. 

The prevalent approach to developing algorithms assumes the RAM model [1, 
3,8], where all accesses to memory are given a unit cost. The focus is mainly on 
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Fig. 1. Burstsort, with five trie nodes, five containers and 
OlOOaeafji, OlOOcbcc^, OlOOdadc/i, 01feac04j,, 01feac04h, 
fcOOceacft,, fc00c7c9h and fc04f0eb/i. 



ten keys. The keys are 
01feac04fc, 01feac04;,, 



reducing the number of instructions used. As a result, many algorithms are tuned 
towards lowering the instruction count. They may not be particularly efficient 
for memory hierarchies, however. 

Recently, there has been much work done on cache-conscious sorting [5,9,10, 
12,14,15]. In 1996, LaMarca and Ladner [7] analyzed algorithms such as merge- 
sort, heapsort, quicksort and LSB radixsort in terms of cache misses and instruc- 
tions and reported that it was practical to make them more efficient by improving 
the locality even at the cost of using more instructions. Their memory tuned al- 
gorithms are often used as a reference for comparing sorting algorithms. Since 
then, several cache-tuned implementations for well known algorithms have been 
developed, such as, Tiled-mergesort, Multi-mergesort, Memory-tuned quicksort, 
PLSB, EPLSB, EBT and CC-Radix. The most efficient of these algorithms are 
used in our experiments and considered in further detail later. 

Tries are a much used data structure for algorithms on string keys and 
have been widely used for searching applications. But recently, Burstsort [12, 
13], based on Burst- Tries [2] has shown excellent performance for sorting string 
keys, primarily by using the CPU-cache more effectively. A burst trie is a col- 
lection of small data structures, called containers, that are accessed by a normal 
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Fig. 2. MEBurstsort, with five trie nodes, three containers and ten keys. The keys 
are OlOOaeafji, OlOOcbcc^, OlOOdadc/i, 01feac04h, 01feac04/i, 01feac04h, 01feac04;,, 
fcOOceac/i, fc00c7c9h and fc04f0eb/i. 



trie. The keys are stored in the containers and the first few bytes of the keys are 
used to construct the access trie. 

In this paper, we propose two new algorithms, Burstsort and MEBurstsort, 
both based on a similar approach of using compact tries, but for integer keys. We 
believe this is the first case of using tries for the purpose of sorting integers. They 
reduce the number of times that keys need to be accessed from main-memory. 
This approach yields better dividends when key size is increased from 32 to 
64 bits. We use several collections and data sizes to measure the performance of 
the sorting algorithms. Both artificially generated and real-world web collections 
are used in the experiments. Experiments include, measuring the running time 
of the algorithms and performing cache simulations to measure the number of 
instructions and cache misses. 

Our results show that both Burstsort and MEBurstsort make excellent use 
of the cache while using low number of instructions. Indeed, they incur only 
20% cache misses compared to MTQsort while the number of instructions is 
similar to the efficient radixsorts. For 32-bit keys, MEBurstsort is the fastest 
for 10 of II collections used and shows much the best performance for skewed 
data distributions. For 64-bit keys, Burstsort is comparable in performance to 
Sequential Counting Split radix sort (we believe, it is the fastest algorithm for 
64-bit keys), whereas, MEBurstsort is the fastest for all collections. These results 
show that our approach of using a compact trie is practical, effective and adapts 
well to varied distributions and key sizes. 



2 Background 

Much of the effort on developing cache-efficient sorting algorithms has focused 
on restructuring existing techniques to take advantage of internal memory hi- 
erarchies. As observed from preliminary experiments on 32-bit keys, only the 
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most efficient of the integer sorting algorithms have been included in this paper. 
Memory- Tuned Quicksort has been included for reference purposes. The algo- 
rithms discussed below can be classihed into two main groups: Quicksort and 
Radixsort. Radixsort approaches can be further grouped into Least- Significant- 
Bit, Most-Significant-Bit and Hybrid. 



Quicksort 

Quicksort is a dominant sorting routine based on the divide-and-conquer ap- 
proach. It is an in-place algorithm and there are several variants on the basic 
scheme. LaMarca and Ladner [7] analyzed quicksort for cache-efficiency and 
suggested using insertion sort on small partitions when they are resident in the 
cache, instead of a final single pass over the entire data. This memory-tuned 
implementation of quicksort has become a standard reference for comparing 
cache-conscious sorting algorithms. For our experiments, we have used the im- 
plementation of LaMarca and Ladner [7] and labelled it as MTQsort. We have 
also used an implementation by Xiao et al. [15] and labelled it as Qsort. 



Radix Sort 

Least Significant Bit (LSB). Sedgewick [11] reports that the LSB approach 
is widely used due to its suitability for machine-language implementation as it 
is based on very simple data structures. 

In LSB, the keys are sorted digit-by-digit starting from the least significant 
digits. The digits are usually sorted using counting sort, which has three phases 
in each pass: a count phase, a prefix sum phase, and a permute phase. Two 
arrays, source and destination (both the size of the dataset), and an auxiliary 
count array (dependent on the size of the alphabet) are used for this purpose. 
The count phase involves counting the number of keys that have identical digits 
at the position under consideration. The prefix sum phase is used to count the 
number of keys falling in a class and calculating the starting position for each 
class in the destination array. The permute phase involves moving the keys from 
the source array to the destination array using the count array as an index to 
the new location. Each pass requires that the source array is traversed twice, 
once each during the count phase and permute phase. The destination array is 
traversed once during the permute phase. 

A simple modification to improve the locality of each pass is by pre-sorting 
small segments to group together keys with equal values. This approach was 
implemented by Rahman and Raman [10] and named Pre-sorting LSB radix 
sort. Each pass involves two sorts: pre-sort and global sort. Counting sort is 
used for both sorts. Though, it does improve the locality by pre-sorting, but the 
basic problem of it being a multi-pass algorithm would mean that the number 
of passes increases with the size of the key. For our experiments, we have used 
the implementation of Rahman and Raman [10] and labelled it as PLSB. 

The permute phase is not cache efficient due to the random accesses to the 
destination array. To reduce these misses, Rahman and Raman [10] used a buffer 
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Table 1. Architectural parameters of the machine used for the experiments. 



Workstation 


Pentium 


Processor type 


Pentium III Xeon 


Clock rate (MHz) 


700 


LI cache (KBytes) 


16 


LI block size (Bytes) 


32 


LI associativity 


4 


LI miss latency (cycles) 


6 


L2 cache (KBytes) 


1,024 


L2 block size (Bytes) 


32 


L2 associativity 


8 


L2 miss latency (cycles) 


109 


TLB entries 


32 


TLB miss latency (cycles) 


5 


Memory Size (MBytes) 


2,048 



to store keys in the same class and copy it in a block to the destination array. For 
our experiments, we have used the implementation of Rahman and Raman [10] 
and labelled it as EBT. 

Extended-radix PLSB was developed to exploit the increase in locality offered 
by pre-sorting. This helps to reduce the number of passes without incurring 
many cache and TLB misses. We have used the implementation of Rahman and 
Raman [10] and labelled it as EPLSB. 

For all the above variants of LSB radix sorts, we have used a radix size of 11 
as it was found to be the most efficient for 32-bit keys and has also been used in 
previous such studies [10,5]. For some algorithms, a radix of 13 for 64-bit keys 
was found to be up to 10% faster due to the reduced number of passes. 

Most Significant Bit. This is a type of distribution sorting where the classes 
are formed based on the value of the most significant bit. Depending upon the 
number of keys in each class, it proceeds recursively or uses a simple algorithm, 
such as insertion sort, to sort very small classes [6]. We have used the implemen- 
tation of Rahman and Raman [9] and labelled it as MSBRadix. It is a multi-pass 
algorithm and unstable for equal keys. Based on the number of keys in a class, 
the radix is varied from a maximum of 16 to a minimum of 2 [9]. 

Hybrid. A hybrid approach uses both LSB and MSB methods. In Cache- 
Conscious radix sort (CCRadix), the data sets that do not fit within the cache 
(or the memory mapped by the TLB) is sorted on the most-significant-digit to 
dynamically divide the data set into smaller partitions that fit within the cache. 
Data sets of sizes less than the cache are sorted using LSB radix sort. Based 
upon data skew and size of the digit, there could be several reverse sorting calls. 
As noted by Jimenez-Gonzalez et al. [4], CCRadix does not scale to 64-bit keys. 
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Table 2. RandomSl collection, 32-bit keys, sorting time for each method (millisec- 
onds). 





Qsort MTQsort PLSB EPLSB 


EBT 


MSBRadix CCRadix Burstsort MEBurstsort 


Set 6 


21,149 


17,829 


14,824 


13,566 


10,710 


12,360 


10,054 


12,437 


13,678 


Set 7 


43,972 


37,566 


29,344 


26,810 


21,272 


26,020 


19,873 


27,716 


27,199 



Table 3. Random20 collection, 32-bit keys, sorting time for each method (millisec- 
onds). 









Data set 




Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 


Qsort 


471 


982 


2,070 


4,382 


9,130 18,985 39,477 


MTQsort 


374 


833 


1,725 


3,666 


7,738 15,776 32,678 


PLSB 


456 


916 


1,832 


3,668 


7,333 14,666 29,157 


EPLSB 


423 


839 


1,675 


3,341 


6,674 13,319 26,275 


EBT 


317 


634 


1,268 


2,541 


5,099 10,193 20,137 


MSBRadix 


296 


645 


1,485 


3,405 


6,943 14,275 28,283 


CCRadix 


411 


861 


1,808 


3,665 


7,359 14,830 29,926 


Burstsort 


327 


677 


1,435 


2,493 


5,095 9,361 19,345 


MEBurstsort 


336 


707 


1,288 


2,313 


4,546 8,729 17,442 



The parameters which showed the best performance for the uniform random dis- 
tribution collection (RandomSl) was chosen for our experiments. We have used 
the implementation of Jimenez-Gonzalez et al. [5] and labelled it as CCRadix. 

SCSRadix is used to sort 64-bit integer keys. It dynamically detects if a 
subset of the data has skew and skips the sorting of the subset. Based on the 
number of keys (n) and the number of bits that remain to be sorted (b), it chooses 
between insertion sort, CCRadix, LSB and Counting Split. Counting split is used 
for partitioning the dataset into smaller sub-buckets of similar size, whereupon 
depending upon n and 6, the other three algorithms are used. We believe, this 
is the fastest sorting routine for 64-bit keys. We have used the implementation 
of Jimenez-Gonzalez et al. [4] and labelled it as SCSRadix. 



3 Sorting Integers with Compact Tries 



Traditionally, trie data structures have been used for managing variable-length 
string keys and found applications in dictionary management, text compression 
and pattern matching [2] . In our earlier work, we investigated the practicality of 
using burst tries [2], a compact and efficient variant of tries, for sorting strings. 
In this section, we describe a similar approach for the purpose of sorting integer 
keys in a cache-efficient manner. 
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Table 4. Binomial collection, 32-bit keys, sorting time for each method (milliseconds). 









Data set 






Set 1 Set 2 Set 3 Set 4 


Set 5 


Set 6 Set 7 


Qsort 


270 


587 


1,295 


2,772 


5,926 12,713 26,723 


MTQsort 


191 


431 


964 


2,129 


4,647 10,316 21,405 


PLSB 


452 


908 


1,815 


3,626 


7,259 14,489 28,805 


EPLSB 


415 


822 


1,644 


3,307 


6,554 13,632 25,784 


EBT 


298 


596 


1,193 


2,389 


4,772 


9,537 18,821 


MSBRadix 


276 


542 


1,075 


2,139 


4,300 


8,523 17,037 


CCRadix 


660 


1,327 


2,656 


5,309 10,635 21,285 41,821 


Burstsort 


284 


529 


1,009 


1,954 


3,846 


7,656 15,176 


MEBurstsort 


179 


322 


598 


1,140 


2,223 


4,385 8,704 



Table 5. Pascal collection, 64-bit keys, sorting time for each method (milliseconds). 









Data set 




Set 1 Set 2 Set 3 


Set 4 Set 5 Set 6 


Qsort 


838 


1,842 


4,018 


8,570 18,201 39,197 


MTQsort 


662 


1,487 


3,292 


7,061 14,933 31,761 


PLSB 


1,308 


2,613 


5,249 10,455 20,941 41,851 


EPLSB 


1,219 


2,436 


4,849 


9,709 19,329 38,602 


EBT 


1,156 


2,314 


4,623 


9,249 18,487 36,987 


MSBRadix 


1,019 


2,012 


3,994 


7,963 15,910 31,813 


SCSRadix 


342 


686 


1,381 


2,776 5,559 10,974 


Burstsort 


478 


879 


1,718 


3,380 6,726 13,395 


MEBurstsort 


355 


646 


1,251 


2,448 4,814 9,527 



Burstsort 

The main principle behind Burstsort is to minimize the number of times that the 
keys need to be accessed from main-memory. This is achieved by dynamically 
constructing a compact trie that rapidly places the keys into containers. It divides 
the dataset based on both the data distribution and size of cache. Burstsort is 
a variant of most-signihcant-bit radixsort and needs to read the distinguishing 
bits in each key at most once. Our earlier work on string keys [12,13] have shown 
that such an approach has excellent performance. 

There are two phases in the construction of a burst trie: insertion and traver- 
sal. The insertion phase inserts the keys into the trie. It is a single-pass traversal 
through the source array. The trie can grow in three ways: creation of a new 
trie node, increasing the size of the existing containers and the creation of new 
containers. When a container becomes too large, it is burst, resulting in the cre- 
ation of a new trie node and new child containers. Once all the keys have been 
inserted, an in-order traversal of the trie is performed. The containers having 
more than one key are sorted on those bits that have not yet been read. We 
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Table 6. RandomGS collection, 64-bit keys, sorting time for each method (millisec- 
onds). 



Data set 

Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 

Qsort 1,106 2,496 5,162 11,630 23,390 51,780 

MTQsort 926 1,958 4,499 9,823 20,660 44,932 

PLSB 1,369 2,749 5,487 11,007 22,564 44,005 

EPLSB 1,268 2,529 5,039 10,104 20,184 40,463 

EBT 1,180 2,363 4,729 9,499 18,946 38,693 

MSBRadix 1,028 2,574 4,638 8,694 16,957 34,313 

SCSRadix 549 1,189 2,694 5,600 11,286 22,708 

Burstsort 556 1,046 2,060 4,365 10,377 22,003 

MEBurstsort 538 1,037 2,050 4,288 9,569 18,649 



have used MSBRadix for sorting containers due to its lower instruction count 
as compared to MTQsort. MSBRadix operates in-place thus making full- use of 
the L2 cache. The usage of other algorithms for sorting containers needs to be 
explored further; it would depend upon the depth of the memory hierarchy and 
their inherent latencies. 

The data structures used for the trie nodes and the containers are arrays. The 
trie node structure is composed of four elements: pointer to a trie or container, 
counter of keys in container, level counter for growing container size, and a tail- 
pointer for the lowest level in the trie hierarchy. The tail-pointer is used to insert 
the keys at the end of the containers in order to maintain stability, though the 
current version is unstable due to using an unstable MSB Radix for sorting the 
containers. 

For our experiments, the size of the container is restricted by the size of the 
L2 cache and is determined by the ratio of cache size to size of the key. Instead 
of allocating the space for the containers all at once, the container grows (using 
the realloc function call) from 16 to 262,144 for 32-bit keys and 16 to 131,072 
for 64-bit keys. They are grown by a factor of 4, for example, 16, 64, 256, 1,024, 
4,096, 16,384, 65,536 and 262,144 for 32-bit keys. Containers used in the lowest 
level are a linked list of arrays of size 128; the keys in these containers are not 
sorted as they are identical. A radix size of 8 bits has been used for the trie 
nodes. 

An example of Burstsort for 32-bit keys is shown in Figure 1, the node is com- 
posed of 256 characters. The 32-bit integer keys are represented by hexadecimal 
numbers and stored in their entirety in the containers. The keys are OlOOaeaf/j, 
OlOOcbcc^i, OlOOdadc/j, 01feac04/j, 01feac04?i, 01feac04^, 01feac04/j, 

fcOOceacft,, fc00c7c9/i and fc04f0eb?i. The threshold value is assumed to be 
three, implying that the container bursts when there are three keys. The lowest 
level is a linked list of arrays of size three. 
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Table 7. Web collection, 32 and 64 bit keys, Set 6, sorting time for each method 
(milliseconds) . 



Qsort MTQsort PLSB EPLSB 


EBT 


MSBRadix CC/SCS Radix Burstsort MEBurstsort 


32-bit 15,038 


11,495 


15,038 


12,971 


9,793 


9,063 


20,759 


10,205 


5,192 


64-bit 43,075 


33,205 


41,307 


37,791 


35,935 


32,568 


11,966 


16,911 


10,286 



MEBurstsort (Memory Efficient Burstsort) 

In Burstsort, the keys are stored in full. This may lead to redundancy as some of 
the information pertaining to each key is implicitly stored in the trie. Thus, 
MEBurstsort was developed for the purpose of eliminating this redundancy. 
Only that portion of the key is stored in the containers which has not already 
been read and thus cannot be gathered from a traversal of the trie. Once all 
the keys have been inserted, the trie is traversed depth- first, the keys in each 
container are reproduced in full and written back to the source array, where it 
is then sorted using the container sorting algorithm. A container at level x has 
sizeof{key) — (a; -I- 1) bytes of the key, it is assumed that the root node is at 
0th level. The linked list of arrays present at the lowest level in Burstsort have 
been eliminated, only the counters are required to keep track of the keys. As 
a result, it is not a stable algorithm, but stability is of significance only when 
sorting pointers to records. 

This compressed storage, saves space in the containers, and it uses much less 
memory than Burstsort. Smaller containers makes it more cache-friendly. But 
as the keys are treated as variable-length bytes, bursting requires copying a key 
byte-by-byte, thus requiring more instructions. MEBurstsort has been observed 
to perform better than Burstsort for all collections. 

An example of MEBurstsort is shown in Figure 2. The integer keys are repre- 
sented by hexadecimal numbers. They are OlOOaeaf/i, OlOOcbcc/j, OlOOdadc/j, 
01feac04/i, 01feac04;i, 01feac04?i, 01feac04?i, fcOOceac^i, fc00c7c9?i 

and f c04f0eb?i. The threshold value is assumed to be three. As can be seen from 
the figure, the bytes of each key that have already been read are not stored in 
the containers. 

4 Experiments 

Several collections with a wide range of characteristics and sizes have been used 
in our experiments. Many of these collections have been used in previous such 
studies [15]. The collections are briefly described below. 

Random63. uniformly distributed integers in the range 0 to 2®^ — 1 and gen- 
erated using the random number generator random() from the C library. 
RandomSl. uniformly distributed integers in the range 0 to 2®^ — 1 and gen- 
erated using the random number generator random() from the C library. 
Random20. uniformly distributed integers in the range 0 to 2^® — 1 and gen- 
erated using the random number generator random() from the C library. 
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plsb 

- eplsb 
— ebt 
— ^ — msbradix 
mtqsort 



- scsradix 
• burstsort 
— * — meburstsort 



Fig. 3. Time/Key, 64-bit keys. Upper: Pascal, Lower: Sorted. 



Equilikely. composed of integers in a specified range. 

Bernoulli, a discrete probability distribution composed of integers 0 or 1 
Geometric, a discrete probability distribution composed of integers 0, 1, 2, ... 
Pascal, a discrete probability distribution composed of integers 0, 1, 2, ... 
Binomial, a discrete probability distribution composed of integers 0, 1, 2, ..., 
N 

Poisson, a discrete probability distribution composed of integers 0, 1, 2, ... 
Zero, composed entirely of Os. 

Sorted, distinct integers sorted in ascending order 

Web. integers in order of occurrence and drawn from a large web collection 

For 32-bit keys, there are seven sets, designated as Set 1, Set 2, Set 3, 
Set 4, Set 5, Set 6 and Set 7. They represent data sizes from 1x1024x1024 
to 64x1024x1024 keys. For 64-bit keys, there are six sets of sizes ranging from 
1x1024x1024 to 32x1024x1024 keys. The set sizes are grown in multiples of two. 
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The goal of the experiments is to compare the performance of our algorithms 
with some of the best known algorithms in terms of running time, number of 
instructions and L2 cache misses. The implementation of the algorithms used in 
our experiments were assembled from the best sources we could identify and are 
confident that these are of high quality. All the algorithms were written in C. 

The task is to sort an array of integers, the array is returned as output. The 
CPU time is measured by using the unix function gettimeofday(). The cost of 
generating the data collections in terms of time, number of instructions and L2 
cache misses are not included in the results reported here. The configurations of 
the machine used for the experiments are presented in Table 1; calibrator ^ 
was used to measure some of the machine configurations. An open-source cache 
simulator, valgrind has been used for simulating the cache, the configurations 
of our machine were used. The experiments were performed on a Linux operating 
system using the GNU gcc compiler with the highest compiler optimization 03. 
The experiments were performed under light load, that is, no other significant 
processes were running. 



5 Results 

All the graphs showing times, instructions and cache misses have been nor- 
malized by dividing by the number of keys. The timings in the tables are in 
milliseconds and are shown unnormalized. 

In agreement with Jimenez-Gonzalez et al. [5], we found GGRadix to be the 
fastest sorting algorithm for 32-bit keys on the RandomSl collection shown in 
Table 2. However, the performance of GGRadix is seen to deteriorate with the 
increase in the number of duplicates as shown in Table 3 for Random20 and 
even more so for small key values in skewed distributions such as the binomial 
collection in Table 4. 

Table 3 shows Burstsort and MEBurstsort to be 1.68 and 1.87 times faster 
than MTQsort for the Random20 collection. The timings for the binomial col- 
lection (composed of only 11 distinct integers of small values) in Table 4, shows 
MEBurstsort to be 2.45 times faster than MTQsort. In the binomial collection, 
the first three bytes are identical for all keys. After the insertion of the threshold 
number of keys (a small fraction of the entire collection) into one container, the 
container is burst. For 32-bit keys, this bursting occurs three times in a loop 
until the threshold number of keys end up in the lowest level. The rest of the 
keys traverse the same path, and since the nodes along that path have already 
been created, much of the information pertaining to these keys can be read in 
just one access. Only the counters in the lowest level need to be incremented, 
so MEBurstsort effectively becomes an in-place algorithm. Both Burstsort and 
MEBurstsort are particularly efficient for this kind of skewed data. Since the 
main reason for the efficiency of our algorithms is the reduced number of times 
that the keys need to be accessed, we expected it to show even better relative 

^ http:/ /homepages. cwi.nl/'manegold/Calibrator 

^ http:/ /developer. kde.org/~sewardj 
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200 4 



100 



- ■- • mtqsort 
— qsort 

- eplsb 
•••*•• plsb 

- ebt 

— I — msbradix 
• burstsort 
— A — meburstsort 

- scsradix 



o4 ^ ^ ^ ^ 1 

Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 

Size of dataset 




- - mtqsort 
— • — qsort 

- eplsb 

— * — meburstsort 
plsb 

— I — msbradix 
* burstsort 
■ ebt 

- • scsradix 



Fig. 4. Instmctions/Key, 64-bit keys, 1 Mb cache, 8- way associativity, 32 bytes block 
size. Upper: Pascal, Lower: RandomSl. 



performance for 64-bit keys, compared to the multi-pass LSB variants. Much of 
the focus below is on sorting 64-bit integer keys. 

Table 5 shows the times for the Pascal collection on 64-bit keys, the re- 
sults are stunning. Burstsort and MEBurstsort are 2.37 and 3.33 times faster 
than MTQsort and the LSB sorting routines such as EPLSB and EBT respec- 
tively. As Figure 3 shows, MEBurstsort and Burstsort is much the fastest and is 
competitive with SCSRadix (which is the fastest 64-bit algorithm to our knowl- 
edge). Even for the Random63 collection as shown in Table 6, both Burstsort 
and MEBurstsort shows the best timings though not as dramatic as for Pascal. 
Similar results have been reported for the real-world web collection. Interest- 
ingly, as shown in Figure 3, the time/key for the sorted collection improves with 
the increase in the datasize. The performance of our algorithms for most of the 




Using Compact Tries for Cache-Efficient Sorting of Integers 525 




■A - pisb 
— ebt 

- eplsb 

— I — msbradix 
qsort 

- ■- ■ mtqsort 

- scsradix 
> burstsort 

— A — meburstsort 




■A - plsb 

ebt 

- eplsb 

— I — msbradix 
qsort 
mtqsort 

- • scsradix 
• burstsort 

— A — meburstsort 



Fig. 5. Cache misses/Key, 64-bit keys, 1 Mb cache, 8- way associativity, 32 bytes block 
size. Upper: Random20, Lower: Pascal. 



other collections such as Bernoulli, Geometric, Binomial, Poisson and Zero are 
similar to that of Pascal. 

The normalized instructions per key are shown in Figure 4. SCSRadix uses 
the least number of instructions for the Pascal collection as it detects the skew. 
MTQsort due to its high complexity cost has the highest number of instruc- 
tions. For the Pascal collection, MEBurstsort and Burstsort is second and third 
respectively. As discussed earlier, for skewed collections, MEBurstsort requires 
lesser number of instructions than Burstsort whereas Burstsort is more efficient 
for uniform distributions as seen from the lower graph in Figure 4. 

Figure 5 shows the number of cache misses per key for 64-bit keys incurred 
by each algorithm for the collections Random20 and Pascal. Both Burstsort and 
MEBurstsort have the least number of cache misses and shows the effective- 
ness of our approach. MEBurstsort has less than one cache miss per key while 
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Burstsort incurs one to two cache misses per key. This is significant and the rel- 
ative performance of our algorithms will continue to improve with more modern 
processors. 

We have also investigated the use of memory by our algorithms. The amount 
of memory used for the largest set size of all collections and for 64-bit keys 
are shown in Table 8. Based on similar memory usage, the algorithms have been 
classified into four groups : in-place such as MTQsort and MSBRadix; LSB radix- 
sorts such as EBT, LSB, PLSB and EPLSB; CCRadix and SCSRadix; Burstsort 
and MEBurstsort. The memory usages for collections such as Geometric, Pas- 
cal, Binomial, Poisson and Zero are similar to Bernoulli. Burstsort uses as much 
memory as the LSBs for all collections except for the uniform distributions where 
it requires about 1.2 times more memory than the LSBs. Keep in mind that the 
buckets are grown by multiples of 4, smaller values will result in lesser memory 
usage. MEBurstsort uses up to 1.5 times less memory than Burstsort and for 
skewed data, it is effectively an in-place algorithm. 

These results show that Burstsort and MEBurstsort are two of the best 
algorithms and have shown excellent performance on collections with varied 
characteristics for both 32-bit and 64-bit keys. 



Table 8. Relative Memory usages (in megabytes) for 64-bit keys with SET 6. 



Burstsort MEBurstsort MTQsort EBT SCSRadix 



Random63 


947 


665 


512 


768 


800 


RandomSl 


947 


665 


512 


768 


800 


Random20 


849 


565 


512 


768 


800 


Equilikely 


853 


566 


512 


768 


800 


Bernoulli 


772 


512 


512 


768 


800 


Sorted 


759 


578 


512 


768 


800 



6 Conclusions 

In this paper, we have proposed two new algorithms, Burstsort and MEBurstsort, 
for sorting large collections of integer keys. They are based on a novel approach of 
using a compact trie for storing the keys. For the evaluation of these algorithms, 
we have compared them against some of the best known algorithms using several 
collections, both artificially generated and from the real-world. 

Our experiments have shown Burstsort and MEBurstsort to be two of the 
fastest sorting algorithms for 64-bit keys, because they have a significantly lower 
number of cache misses, as well as a low instruction count. Even for 32-bit keys, 
they are the fastest for all collections except for RandomSl. They have a similar 
theoretical cost as the most-significant-digit radix sorts. They both adapt well 
to varying distributions such as skew, no skew, sorted, small and large values of 
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keys. The experiments also confirm our expectations that the larger the size of 
keys, the better the relative performance of our algorithms. Thus, while Burstsort 
and MEBurstsort are up to 1.7 and 2.5 times faster than MTQsort for 32-bit 
keys, they are up to 2.37 and 3.5 times faster than MTQsort for 64-bit keys, 
respectively. 

There is much scope to further improve these algorithms. They are expected 
to show better relative performance when applied to sorting pointers to keys. 
In preliminary work on sorting pointers to integer keys, Burstsort was found to 
be up to 3.72 times faster than MTQsort. In parallel work on strings, we have 
observed that randomization techniques are useful in lowering both instructions 
and cache misses even further; these techniques should be readily applicable for 
sorting integers. The effect of TLB on our algorithms needs to be investigated. 
Preliminary work on using different radix sizes in the trie nodes have shown 
promising results, such as, a radix of 11 shows better performance than a radix of 
8 for 32-bit keys. Even without these improvements, Burstsort and MEBurstsort 
represent a novel and important advance for efficient sorting of integer keys. 
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Abstract. Algorithms for sorting large datasets can be made more effi- 
cient with careful use of memory hierarchies and reduction in the number 
of costly memory accesses. In earlier work, we introduced burstsort, a new 
string sorting algorithm that on large sets of strings is almost twice as 
fast as previous algorithms, primarily because it is more cache-efficient. 
The approach in burstsort is to dynamically build a small trie that is 
used to rapidly allocate each string to a bucket. In this paper, we in- 
troduce new variants of our algorithm: SR-burstsort, DR-burstsort, and 
DRL-burstsort. These algorithms use a random sample of the strings to 
construct an approximation to the trie prior to sorting. Our experimen- 
tal results with sets of over 30 million strings show that the new vari- 
ants reduce cache misses further than did the original burstsort, by up 
to 37%, while simultaneously reducing instruction counts by up to 24%. 
In pathological cases, even further savings can be obtained. 



1 Introduction 

In-memory sorting is a basic problem in computer science. However, sorting 
algorithms face new challenges due to changes in computer architecture. Proces- 
sor speeds have been increasing at 60% per year, while speed of access to main 
memory has been increasing at only 7 % per year, a growing processor-memory 
performance gap that appears likely to continue. An architectural solution has 
been to introduce one or more levels of fast memory, or cache, between the 
main memory and the processor. Small volumes of data can be sorted entirely 
within cache — typically a few megabytes of memory in current machines — but, 
for larger volumes, each random access involves a delay of up to hundreds of 
clock cycles. 

Much of the research on algorithms has focused on complexity and efficiency 
assuming a non-hierarchical RAM model, but these assumptions are not realistic 
on modern computer architectures, where the levels of memory have different 
latencies. While algorithms can be made more efficient by reducing the number 
of instructions, current research [8,15,17] shows that an algorithm can afford to 
increase the number of instructions if doing so improves the locality of mem- 
ory accesses and thus reduces the number of cache misses. In particular, recent 
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work [8,13,17] has successfully adapted algorithms for sorting integers to memory 
hierarchies. 

According to Arge et al. [2] “string sorting is the most general formulation of 
sorting because it comprises integer sorting (i.e., strings of length one), multikey 
sorting (i.e., equal-length strings) and variable- length key sorting (i.e., arbitrar- 
ily long strings)”. String sets are typically represented by an array of pointers 
to locations where the variable-length strings are stored. Each string reference 
incurs at least two cache misses, one for the pointer and one or more for the 
string itself depending on its length and how much of it needs to be read. 

In our previous work [15,16], we introduced burstsort, a new cache-efficient 
string sorting algorithm. It is based on the burst trie data structure [7] , where a 
set of strings is organised as a collection of buckets indexed by a small access trie. 
In burstsort, the trie is built dynamically as the strings are processed. During 
the first phase, at most the distinguishing prefix — but usually much less — is read 
from each string to construct the access trie and place the string in a bucket, 
which is a simple array of pointers. The strings in each bucket are then sorted 
using an algorithm that is efficient both in terms of the space and the number 
of instructions for small sets of strings. There have been several recent advances 
made in the area of string sorting, but our experiments [15,16] showed burstsort 
to be much more efficient than previous methods for large string sets. (In this 
paper, for reference we compare against three of the best previous string sorting 
algorithms: MBM radixsort [9], multikey quicksort [3], and adaptive radixsort [1, 
Ilj.) However, burstsort is not perfect. A key shortcoming is that individual 
strings must be re-accessed as the trie grows, to redistribute them into sub- 
buckets. If the trie could be constructed ahead of time, this cost could be largely 
avoided, but the shape and size of the trie strongly depends on the characteristics 
of the data to be sorted. 

Here, we propose new variants of burstsort: SR-burstsort, DR-burstsort, and 
DRL-burstsort. These use random sampling of the string set to construct an 
approximation to the trie that is built by the original burstsort. Prefixes that 
are repeated in the random sample are likely to be common in the data; thus it 
intuitively makes sense to have these prefixes as paths in the trie. As an efficiency 
heuristic, rather than thoroughly process the sample we simply process them in 
order, using each string to add one more node to the trie. In SR-burstsort, the 
trie is then fixed. In DR-burstsort, the trie can if necessary continue to grow as in 
burstsort, necessitating additional tests but avoiding inefficiency in pathological 
cases. In DRL-burstsort, total cache size is used to limit initial trie size. 

We have used several small and large sets of strings, as described in our 
earlier work [15,16], for our experiments. SR-burstsort is in some cases slightly 
more efficient than burstsort, but in other cases is much slower. DR-burstsort 
and DRL-burstsort are more efficient than burstsort in almost all cases, though 
with larger collections the amount of improvement decreases. In addition, we 
have used a cache simulator to examine individual aspects of the performance, 
and have found that in the best cases both the number of cache misses and 
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the number of instructions falls dramatically compared to burstsort. These new 
algorithms are the fastest known way to sort a large set of strings. 

2 Background 

In our earlier work [15,16] we examined previous algorithms for sorting strings. 
The most efficient of these were adaptive radixsort, multikey quicksort, and 
MBM radixsort. Adaptive radixsort was introduced by Andersson and Nilsson 
in 1996 [1,11]; it is an adaptation of the distributive partitioning developed by 
Dobosiewicz to standard most-signihcant-digit-first radixsort. The alphabet size 
is chosen based on the number of elements to be sorted, switching between 8 bits 
and 16 bits. In our experiments, we used the implementation of Nilsson [11]. 

Multikey quicksort was introduced by Sedgewick and Bentley in 1997 [3]. It 
is a hybrid of ternary quicksort and MSD radixsort. It proceeds character-wise 
and partitions the strings into buckets based upon the value of the character at 
the position under consideration. The partitioning stage proceeds by selecting 
a random pivot and comparing the first character of the strings with the first 
character of the pivot. As in ternary quicksort, the strings are then partitioned 
into three sets — less than, equal to, and greater than — which are then sorted 
recursively. In our experiments, we used an implementation by Sedgewick [3]. 

MBM radixsort (our nomenclature) is one of several high-performance MSD 
radixsort variants tuned for strings that were introduced by Mcllroy, Bostic, 
and Mcllroy [9] in the early 1990s. We used programC, which we found exper- 
imentally to be most efficient of these variants; we found it to be the fastest 
array-based, in-place sorting algorithm for strings. 

Burstsort. Any data structure that maintains the data in order can be used as 
the basis of a sorting method. Burstsort is based on this principle. A trie structure 
is used to place the strings in buckets by reading at most the distinguishing 
prefix; this structure is built incrementally as the strings are processed. There 
are two phases; first is insertion of the strings into the burst trie structure, second 
is an in-order traversal, during which the buckets are sorted. 

The trie is built by bursting a bucket once it becomes too large; a new node 
is created and the strings in the bucket are inserted into the node, creating 
new child buckets. A hxed threshold — the maximum number of strings that can 
be held in a bucket — is used to determine whether to burst. Strings that are 
completely consumed are managed in a special “end of string” structure. 

During the second, traversal phase, if the number of strings in the bucket is 
more than one, then a sorting algorithm that takes the depth of the character of 
the strings into account is used to sort the strings in the bucket. We have used 
multikey quicksort [3] in our experiments. 

The set of strings is recursively partitioned on their lead characters, then 
when a partition is sufficiently small it is sorted by a simple in-place method. 
However, there is a key difference between radixsorts and burstsort. In the first, 
trie-construction phase the standard radixsorts proceed character-wise, process- 
ing the Hrst character of each string, then re-accessing each string to process the 
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Fig. 1. A burst trie of four nodes and five buckets. 



next character, and so on. Each trie node is handled once only, but strings are 
handled many times. In contrast, burstsort proceeds string-wise, accessing each 
string once only to allocate it to a bucket. Each node is handled many times, 
but the trie is much smaller than the data set, and thus the nodes can remain 
resident in cache. 

Eigure 1 shows an example of a burst trie containing eleven records whose 
keys are “backup”, “balm”, “base”, “by”, “by”, “by”, “by”, “bypass”, “wake”, 
“walk” , and “went” respectively. In this example, the alphabet is the set of letters 
from A to Z, and in addition an empty string symbol _L is shown; the bucket 
structure used is an array. The access trie has four trie nodes and five buckets 
in all. The leftmost bucket has three strings, “backup”, “balm” and “base”, the 
second bucket has four identical strings “by”, the fourth bucket has two strings 
“wake” and “walk”, the rightmost bucket has only one string “went”. 

Experimental results comparing burstsort to previous algorithms are shown 
later. As can be seen, for sets of strings that are significantly larger than the 
available cache, burstsort is up to twice as fast. The gain is largely due to dra- 
matically reduced numbers of cache misses compared to previous techniques. 

Randomised algorithms. A randomised algorithm is one that makes random 
choices during its execution. According to Motwani and Raghavan [10], “two 
benefits of randomised algorithms have made them popular: simplicity and effi- 
ciency. For many applications, a randomised algorithm is the simplest available, 
or the fastest, or both.” 

One application of randomisation for sorting is to rearrange the input in 
order to remove any existing patterns, to ensure that the expected running time 
matches the average running time [4]. The best-known example of this is in 
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Table 1. Statistics of the data collections used in the experiments. 



Data set 

Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 

Duplicates 
Size Mb 

Distinct Words (xlO®) 

Word Occurrences (xlO®) 

No duplicates 
Size Mb 

Distinct Words (xlO®) 

Word Occurrences (xlO®) 

Genome 
Size Mb 

Distinct Words (xlO®) 

Word Occurrences (xlO®) 

Random 

Size Mb 1.004 3.167 10.015 31.664 100.121 316.606 

Distinct Words (xlO®) 0.891 2.762 8.575 26.833 83.859 260.140 
Word Occurrences ( XlO®) 1 3.162 10 31.623 100 316.230 

URL 

Size Mb 3.030 9.607 30.386 96.156 304.118 — 

Distinct Words ( XlO®) 0.361 0.923 2.355 5.769 12.898 — 

Word Occurrences ( XlO®) 1 3.162 10 31.623 100 — 



1.013 3.136 7.954 27.951 93.087 304.279 

0.599 1.549 3.281 9.315 25.456 70.246 

1 3.162 10 31.623 100 316.230 



1.1 3.212 10.796 35.640 117.068 381.967 

1 3.162 10 31.623 100 316.230 

1 3.162 10 31.623 100 316.230 



0.953 3.016 9.537 30.158 95.367 301.580 

0.751 1.593 2.363 2.600 2.620 2.620 

1 3.162 10 31.623 100 316.230 



quicksort, where randomisation of the input lessens the chance of quadratic 
running time. Input randomisation can also be used in cases such as binary 
search trees to eliminate the worst case when the input sequence is sorted. 

Another application of randomisation is to process a small sample from a 
larger collection. In simple random sampling, each individual key in a collection 
has an equal chance of being selected. According to Olkem and Roten [12], 

Random sampling is used on those occasions when processing the entire 
dataset is unnecessary and too expensive . . . The savings generated by 
sampling may arise either from reductions in the cost of retrieving the 
data ... or from subsequent postprocessing of the sample. Sampling is 
useful for applications which are attempting to estimate some aggregate 
property of a set of records. 



3 Burstsort with Random Sampling 

In earlier work [15] , we showed that burstsort is efficient in sorting strings because 
of the low rate of cache miss compared to other string sorting methods. Cache 
misses occur when the string is fetched for the first time, during a burst, and 
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Table 2. Duplicates, sorting time for each method (milliseconds). 



Threshold 








Data set 




Set 1 Set 2 Set 3 Set 4 


Set 5 Set 6 




Multikey quicksort 


62 


272 


920 3,830 14,950 56,070 




MBM radixsort 


58 


238 


822 3,650 15,460 61,560 




Adaptive radixsort 


74 


288 


900 3,360 12,410 51,870 




SR-burstsort 


60 


200 


560 2,010 


7,620 31,040 


8192 


Burstsort 


58 


218 


630 2,220 


7,950 29,910 




DR-burstsort 


60 


200 


560 2,030 


7,390 28,530 




DRL-burstsort 


60 


200 


560 2,030 


7,510 29,030 


16384 


Burstsort 


60 


210 


630 2,270 


7,970 28,490 




DR-burstsort 


60 


200 


550 2,020 


7,280 27,310 


32768 


Burstsort 


60 


210 


630 2,380 


8,250 28,530 




DR-burstsort 


60 


200 


560 2,010 


7,160 27,400 


65536 


Burstsort 


60 


210 


640 2,480 


8,590 29,620 




DR-burstsort 


60 


200 


560 2,010 


7,150 26,640 


131072 


Burstsort 


60 


220 


660 2,550 


9,190 31,260 




DR-burstsort 


60 


200 


560 2,010 


7,140 27,420 



during the traversal phase when the bucket is sorted. Our results indicated that 
the threshold size should be selected such that the average number of cache 
misses per key during the traversal phase is close to 1. 

Most cache misses occur while the strings are being inserted into the trie. 
One way in which cache misses could be reduced during the insertion phase is 
if the trie could be built beforehand, avoiding bursts and allowing strings to be 
placed in the trie with just one access, giving — if everything has gone well — a 
maximum of two accesses to a string overall, once during insertion and once 
during traversal. This is an upper bound, as some strings need not be referenced 
in the traversal phase and, as the insertion is a sequential scan, more than one 
string may fit into a cache line. 

We propose building the trie beforehand using a random sample of the strings, 
which can be used to construct an approximation to the trie. The goal of the 
sampling is to get as close as possible to the shape of the tree constructed by 
burstsort, so the strings evenly distribute in the buckets, which can then be 
efficiently sorted in the cache. However, the cost of processing the sample should 
not be too great, or it can outweigh the gains. As a heuristic, we make just one 
pass through the sample, and use each string to suggest one additional trie node. 

Sampling process. 

1. Create an empty trie root node r, where a trie node is an array of pointers 
(to either trie nodes or buckets). 

2. Choose a sample size R, and create a stack of R empty trie nodes. 

3. A random sample of R strings is drawn from the input data. 
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Table 3. Genome, sorting time for each method (milliseconds). 



Data set 



Threshold 


Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 




Multikey quicksort 72 324 1,250 4,610 16,670 62,680 

MBM radixsort 72 368 1,570 6,200 23,700 90,700 

Adaptive radixsort 92 404 1,500 4,980 17,800 66,100 




SR-burstsort 70 240 780 2,530 10,320 44,810 


8192 


Burstsort 70 258 870 2,830 8,990 31,540 

DR-burstsort 70 240 770 2,470 7,960 30,870 

DRL-burstsort 70 240 770 2,460 8,410 30,680 


16384 


Burstsort 70 290 910 2,760 8,720 30,280 

DR-burstsort 70 240 780 2,390 7,520 27,850 


32768 


Burstsort 80 280 940 3,000 9,520 31,140 

DR-burstsort 60 240 770 2,390 7,560 28,780 


65536 


Burstsort 70 310 1,010 3,130 9,820 32,860 

DR-burstsort 70 240 770 2,400 7,520 28,710 


131072 


Burstsort 80 300 1,070 3,400 10,940 36,630 

DR-burstsort 70 230 770 2,400 7,570 28,740 



4. For each string ci . . . c„ in the sample, 

a) Use the string to traverse the trie until the current character corresponds 
to a null pointer. That is, set p t— r, and i ^ 1, and, until p[ci] is 
null, continue by setting p t— p[ci] and incrementing i. For example, on 
insertion of “michael” , if “mic” was already a path in the trie, a node is 
added for “h”. 

b) If the string is not exhausted, that is, i < n, take a new node t from the 
stack and set p[ci] ■(— t. 

The sampled strings are not stored in the buckets; to maintain stability, they 
are inserted when encountered during the main sorting process. The minimum 
number of trie nodes created is 1 if all the strings in the collection are identical 
and of length 1. The maximum number of trie nodes created is equal to the size 
of the sample and is more likely in collections such as the random collection. 

The intuition behind this approach is that, if a prefix is common in the data 
then there will be several strings in the sample with that prefix. The sampling 
algorithm will then construct a branch of trie nodes corresponding to that prefix. 

For example, in an English dictionary (from the utility ispell) of 127,001 
strings, seven begin with “throu”, 75 with “thro”, 178 with “thr”, 959 with “th”, 
and 6713 with “t”. Suppose we sample 127 times with replacement, correspond- 
ing to an expected bucket size of 1000. Then the probability of sampling “throu” 
is only 0.01, of “thro” is 0.07, of “thr” is 0.16, of “th” is 0.62, and of “t” is 0.999. 
With a bucket size of 1000, a burst trie would allocate a node corresponding to 
the path “t” and would come close to allocating a node for “th”. Under sam- 
pling, it is almost certain that a node will be allocated for “t” — there is an even 
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Table 4. URLs, sorting time for each method (milliseconds). The fastest times in the 
burstsort family are shown in bold. 



Threshold 


Data set 

Set 1 Set 2 Set 3 Set 4 Set 5 




SR-burstsort 100 360 1,310 5,350 19,420 


8192 


Burstsort 110 390 1,530 5,080 17,860 

DR-burstsort 110 370 1,450 4,860 17,130 
DRL-burstsort 100 370 1,450 4,850 17,610 


16384 Burstsort 110 390 1,630 5,280 18,800 

DR-burstsort 110 380 1,530 4,890 17,350 


32768 


Burstsort 130 420 1,510 6,710 21,560 

DR-burstsort 110 370 1,380 5,890 18,670 


65536 


Burstsort 170 440 1,540 6,290 24,010 

DR-burstsort 110 370 1,380 5,410 19,360 


131072 


Burstsort 140 480 1,550 6,310 27,120 

DR-burstsort 110 370 1,340 5,330 19,830 




Size of dataset (millions) Size of dataset (millions) 



(a) Duplicates (b) Genome 

Fig. 2. L2 cache misses for the most efficient sorting algorithms, burstsort has a thresh- 
old of 8192. 



chance that it would be one of the first 13 nodes allocated — and likely that a 
node would be allocated for “th” . Nodes for the deeper paths are unlikely. 

SR-burstsort. In burstsort, the number of trie nodes created is roughly linear in 
the size of the set to be sorted. It is therefore attractive that the number of nodes 
allocated through sampling be a fixed percentage of the number of elements in 
the set; by the informal statistical argument above, the trie created in the initial 
phase should approximate the trie created by applying standard burstsort to the 
same data. In static randomised burstsort, or SR-burstsort, the trie structure 
created by sampling is then static. The structure grows only through addition of 
strings to buckets. The use of random sampling means that common prefixes will 
in the great majority of runs be represented in the trie and strings will distribute 
well amongst the buckets. 
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Table 6 . Artificial sets, sorting time for each method (milliseconds). 









Collection 




Threshold 




Artihcial A Artificial B Artificial C 




SR-burstsort 


2,650 


9,220 


1,600 


8192 


Burstsort 

DR-burstsort 


2,740 

2,340 


10,130 

9,080 


1,430 

1,300 


16384 


Burstsort 

DR-burstsort 


2,510 

2,320 


10,110 

8,890 


1,460 

1,340 


32768 


Burstsort 

DR-burstsort 


2,910 

2,320 


10,540 

8,110 


1,880 

1,430 


65536 


Burstsort 

DR-burstsort 


3,760 

2,340 


11,210 

8,010 


2,610 

1,540 


131072 


Burstsort 

DR-burstsort 


5,190 

2,320 


11,820 

7,890 


3,810 

1,670 


262144 


Burstsort 

DR-burstsort 


7,900 

2,290 


13,200 

7,930 


5,660 

1,570 



Table 6. Random, sorting time for each method (milliseconds). 



Threshold 








Data set 


Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 




SR-burstsort 


50 


170 


570 1,930 7,060 29,410 


8192 


Burstsort 


50 


180 


650 2,100 6,450 23,040 




DR-burstsort 


50 


180 


580 2,050 6,910 30,790 




DRL-burstsort 


50 


180 


570 2,050 6,470 23,340 



For a set of N strings, we need to choose a sample size. We use a relative trie 
size parameter S. For our experiments we used S = 8192, because this value 
was an effective bucket-size threshold in our earlier work. Then the sample size, 
and the maximum number of trie nodes that can be created, is R = N/S. 

SR-burstsort proceeds as follows: use the sampling procedure above to build 
an access trie; insert the strings in turn into buckets; then traverse the trie 
and buckets to give the sorted result. No bursts occur. Buckets are a linked list 
of arrays of a fixed size (an implementation decision derived from preliminary 
experiments). The last element in each array is a pointer to the next array. In 
our experiments we have used an array size of 32. 

SR-burstsort has several advantages compared to the original algorithm. The 
code is simpler, with no thresholds or bursting, thus requiring far fewer instruc- 
tions during the insertion phase. Insertion also requires fewer string accesses. 
The nodes are allocated as a block, simplifying dynamic memory management. 

However, bucket size is not capped, and some buckets may not fit entirely 
within the cache. The bucket sorting routine is selected mainly for its instruc- 
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(a) Duplicates (b) URLS 

Fig. 3. Instructions per element on each data set, for each variant of burstsort for a 
threshold of 32768. 





(a) Duplicates (b) URLs 

Fig. 4. L2 cache misses per element on each data set, for each variant of burstsort for 
a threshold of 32768. 



tion and space efficiency for small sets of strings and not for cache efficiency. 
Moreover, small changes in the trie shape can lead to large variations in bucket 
size: omitting a single crucial trie node due to sampling error may mean that a 
very large bucket is created. 



DR-burstsort. An obvious next step is to eliminate the cases in SR-burstsort 
when the buckets become larger than cache and bucket sorting is not entirely 
cache-resident. This suggests dynamic randomised burstsort, or DR-burstsort. In 
this approach, an initial trie is created through sampling as before, but as in the 
original burstsort a limit is imposed on bucket size and buckets are burst if this 
limit is exceeded. DR-burstsort avoids the bad cases that arise in SR-burstsort 
due to sampling errors. The number of bursts should be small, but, compared 
to SR-burstsort, additional statistics must be maintained. 

Thus DR-burstsort is as follows: using a relative trie size S, select a sample of 
R = N/S strings and create an initial trie; insert the strings into the trie as for 
burstsort; then traverse as for burstsort or SR-burstsort. Buckets are represented 
as arrays of 16, 128, 1024, or 8192 pointers, growing from one size to the next 
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as the number of strings to be stored increases, as we have described elsewhere 
for burstsort [16]. 

DRL-burstsort. For the largest sets of strings, the trie is much too large to be 
cache resident. That is, there is a trade-off between whether the largest bucket 
can fit in cache and whether the trie can fit in cache. One approach is to stop 
bursts at some point, especially as bursts late in the process are not as helpful. 
We have not explored this approach, as it would be unsuccessful with sorted 
data. 

Another approach is to limit the size of the initial trie to fit in cache, to 
avoid the disadvantages of extraneous nodes being created. This variant, DR- 
burstsort with limit or DRL-burstsort, is tested below. The limit used in our 
experiments depends on the size of the cache and the size of the trie nodes. In 
our experiments, we chose R so that R times node size is equal to the cache size. 



4 Experiments 

For realistic experiments with large sets of strings, we are limited to sources 
for which we have sufficient volumes of data. We have drawn on web data and 
genomic data. For the latter, we have parsed nucleotide strings into overlapping 
9-grams. For the former, derived from the TREC project [5,6], we extracted both 
words — alphabetic strings delimited by non-alphabetic characters — and URLs. 
For the words, we considered sets with and without duplicates, in both cases in 
order of occurrence in the original data. 

For the word data and genomic data, we created six subsets, of approximately 
10^ 3.1623 X 10^, 10®, 3.1623 x 10®, 10^ and 3.1623 x 10^ strings each. We call 
these Set 1, Set 2, Set 3, Set 4, Set 5, and Set 6 respectively. For the URL 
data, we created Set 1 to Set 5. In each case, only Set 1 fits in cache. In 
detail, the data sets are as follows. 

Duplicates. Words in order of occurrence, including duplicates. The statistical 
characteristics are those of natural language text; a small number of words 
are frequent, while many occur once only. 

No duplicates. Unique strings based on word pairs in order of first occurrence 
in the TREC web data. 

Genome. Strings extracted from a collection of genomic strings, each typically 
thousands of nucleotides long. The strings are parsed into shorter strings of 
length 9. The alphabet is comprised of four characters, “a”, “t”, “g”, and 
“c” . There is a large number of duplicates and the data shows little locality. 
Random. An artificially generated collection of strings whose characters are 
uniformly distributed over the entire ASCII range. The length of each string 
is random in the range 1-20. 

URL. Complete URLs, in order of occurrence and with duplicates, from the 
TREC web data. Average length is high compared to the other sets of strings. 
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Artificial A. A collection of identical strings on an alphabet of one character. 
Each string is one hundred characters long and the size of the collection is 
one million. 

Artificial B. A collection of strings with an alphabet of nine characters. The 
length of strings are varied randomly from one to hundred and the size of 
the collection is ten million. 

Artificial C. A collection of strings whose length ranges from one to hundred. 
The alphabet size is one and the strings are ordered in increasing length 
arranged cyclically. The size of the collection is one million. 

The cost of bursting increases with the size of the container as more strings 
need to be fetched from memory, leading to increases in the number of cache 
misses and of instructions. Each correct prediction of a trie node removes the 
need to burst a container. Another situation where bursting could be expensive 
is use of inefficient data structures such as binary search trees or linked lists as 
containers. Traversing a linked list could result in two memory accesses for each 
container element, one access to the string and one access to the list node. To 
show how sampling can be beneficial as bursting becomes more expensive, we 
have measured the running time, instruction count and cache misses as the size 
of the container is increased from 1024 to 131,072, or, for the artificial collections, 
up to 262,144. 

The aim of the experiments is to compare the performance of our algorithms, 
in terms of the running time, number of instructions, and number of L2 cache 
misses. The time measured is to sort an array of pointers to strings; the array is 
returned as the output. We therefore report the CPU times, not elapsed times, 
and exclude the time taken to parse the collections into strings. 

The experiments were run on a Pentium 111 Xeon 700 MHz computer with 
2 Gb of internal memory, 1 Mb L2 cache with block size of 32 bytes, 8-way as- 
sociativity and a memory latency of about 100 cycles. We have used the highest 
compiler optimization 03 in all our experiments. The total number of millisec- 
onds of CPU time has been measured; the time taken for I/O or to parse the 
collection are not included as these are in common for all algorithms. For the 
cache simulations, we have used valgrind [14]. 



5 Results 

We present results in three forms: time to sort each data set, instruction counts, 
and L2 cache misses. Times for sorting are shown in Tables 2 to 6. Instruction 
counts are shown in Figures 3 and 5. L2 cache misses are shown in Figures 3, 3 
and 5; the trends for the other data sets are similar. 

On duplicates, the sorting times for the burstsort methods are, for all cases 
but Set 1, faster than for the previous methods. These results are as observed in 
our previous work. The performance gap steadily grows with data set size, and 
the indications from all the results — instructions, cache misses, and timings — are 
that the improvements yielded by burstsort will continue to increase with both 
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200-1 1 1 1 1 1 1 1 

1024 2048 4096 8192 16384 32768 65536 131072 



1000-1 1 1 1 1 1 1 1 1 

1024 2048 4096 8192 16384 32768 65536 131072 262144 



Size of Container (Keys) 



Size of Container (Keys) 



(a) Genome 



(b) Artificial A 
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Size of Container (Keys) 




500-1 — 
1024 



2048 



4096 8192 16384 32768 65536 131072 

Size of Container (Keys) 



(c) Duplicates 



(d) URLs 



Fig. 5. Instructions per element for the largest data set, for each variant of burstsort. 



changes in computer architecture and growing data volumes. Figure 3 shows the 
L2 cache misses in comparison to the best algorithms found in our earlier work. 

Figures 3 and 3 show the number of instructions and L2 cache misses for a 
container size of 32768. Several overall trends can be observed. The number of 
instructions per string does not vary dramatically for any of the methods, though 
it does have perturbations due to characteristics of the individual data sets. SR- 
burstsort consistently uses fewer instructions than the other methods, while the 
original burstsort requires the most. Amongst the burstsorts, SR-burstsort is 
consistently the slowest for the larger sets due to more L2 cache misses than 
burstsort, despite requiring fewer instructions. 

For most collections, either DR-burstsort or DRL-burstsort is the fastest 
sorting technique, and they usually yield similar results. Compared to burstsort, 
DR-burstsort uses up to 24% fewer instructions and incurs up to 37% fewer 
cache misses. However, there are exceptions, in particular DRL-burstsort has 
done much better than DR-burstsort on the random data; on this data, burstsort 
is by a small margin the fastest method tested. The heuristic in DRL-burstsort 
of limiting the initial trie to the cache size has led to clear gains in this case, in 
which the sampling process is error-prone. 

Some of the data sets have individual characteristics that affect the trends. 
In particular, with the fixed length of the strings in the genome data, increasing 
the number of strings does not increase the number of distinct strings, thus the 
relative costs of sorting under the different methods changes with increasing data 
set size. In contrast, with duplicates the number of distinct strings continues to 
steadily grow. 
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Fig. 6. L2 cache misses per element for the largest data set, for each variant of burstsort. 



The sorting times shown in Tables 2 to 6 shows that as the size of the 
container increases, burstsort becomes more expensive. On the other hand, the 
cost of DR-burstsort does not vary much with increasing container size. Table 5 
shows DR-burstsort can be as much as 3.5 times faster than burstsort. As shown 
in Figure 5, the number of instructions incurred by DR-burstsort can be up 
to 30% less than burstsort. Also, interestingly the number of instructions do not 
appear to vary much as the size of the container increases. Figure 5 similarly 
shows that the number of misses incurred by DR-burstsort can be up to 90% 
less than burstsort. 

All of the new methods require fewer instructions than the original burstsort. 
More importantly, in most cases DR-burstsort and DRL-burstsort require fewer 
cache misses. This trend means that, as the hardware performance gap grows, 
the relative performance of our new methods will continue to improve. 

6 Conclusions 

We have proposed new algorithms — SR-burstsort, DR-burstsort, and DRL- 
burstsort — for fast sorting of strings in large data collections. They are a variant 
of our burstsort algorithm and are based on construction of a small trie that 
rapidly allocates strings to buckets. In the original burstsort, the trie was con- 
structed dynamically; the new algorithms are based on taking a random sample 
of the strings and using them to construct an initial trie structure before any 
strings are inserted. 
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SR-burstsort, where the trie is static, reduces the need for dynamic memory 
management and simplifies the insertion process, leading to code with a lower 
instruction count than the other alternatives. Despite promising performance in 
preliminary experiments and the low instruction count, however, it is generally 
slower than burstsort, as there can easily be bad cases where a random sample 
does not correctly predict the trie structure, which leads to some buckets being 
larger than expected. 

DR-burstsort and DRL-burstsort improve on the worst case of SR-burstsort 
by allowing the trie to be modified dynamically, at the cost of additional checks 
during insertion. They are faster than burstsort in all experiments with real 
data, due to elimination of the need for most of the bursts. The use of a limit in 
DRL-burstsort avoids poor cases that could arise in data with a flat distribution. 

Our experimental results show that the new variants reduce cache misses even 
further than does the original burstsort, by up to 37%, while simultaneously 
reducing instruction counts by up to 24%. As the cost of bursting grows, the 
new variants reduce cache misses by up to 90%, while simultaneously reducing 
instruction counts by up to 30% and the time to sort is reduced by up to 72% 
as compared to burstsort. 

There is scope to further improve these algorithms. Pre-analysis of collec- 
tions to see whether the alphabet is restricted showed an improvement of 16% 
for genomic collections. Pre-analysis would be of value for customised sorting 
applications. Another variation is to choose the sample size based on analysis of 
collection characteristics. A further variation is to recursively apply SR-burstsort 
to large buckets. We are testing these options in current work. 

Even without these improvements, however burstsort and its variants are 
a significant advance, dramatically reducing the costs of sorting a large set of 
strings. Cache misses and running time are as low as half that required by any 
previous method. With the current trends in computer architecture, the relative 
performance of our methods will continue to improve. 
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Abstract. In this paper we investigate the datapath merging problem 
(DPM) in reconfigurable systems. DPM is in A/”P-hard and it is described 
here in terms of a graph optimization problem. We present an Integer 
Programming (IP) formulation of DPM and introduce some valid in- 
equalities for the convex hull of integer solutions. These inequalities form 
the basis of a branch-and-cut algorithm that we implemented. This al- 
gorithm was used to compute lower bounds for a set of DPM instances, 
allowing us to assess the performance of the heuristic proposed by More- 
ano et al. [1] which is among the best ones available for the problem. 
Our computational experiments confirmed the efficiency of Moreano’s 
heuristic. Moreover, the branch-and-cut algorithm also was proved to be 
a valuable tool to solve small-sized DPM instances to optimality. 



1 Introduction 

It is well known that embedded systems must meet strict constraints of high- 
throughput, low power consumption and low cost, specially when designed for 
signal processing and multimedia applications [2]. These requirements lead to 
the design of application specific components, ranging from specialized functional 
units and coprocessors to entire application specific processors. Such components 
are designed to exploit the peculiarities of the application domain in order to 
achieve the necessary performance and to meet the design constraints. 

With the advent of reconfigurable systems, the availability of large/cheap ar- 
rays of programmable logic has created a new set of architectural alternatives for 
the design of complex digital systems [3,4]. Reconfigurable logic brings together 
the flexibility of software and the performance of hardware [5,6]. As a result, it 
became possible to design application specific components, like specialized dat- 
apaths, that can be reconfigured to perform a different computation, according 
to the specific part of the application that is running. At run-time, as each por- 
tion of the application starts to execute, the system reconfigures the datapath 
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so as to perform the corresponding computation. Recent work in reconfigurable 
computing research has shown that a significant performance speedup can be 
achieved through architectures that map the most time-consuming application 
kernel modules or inner-loops to a reconfigurable datapath [ 7 , 8 , 9 ]. 

The reconfigurable datapath should have as few and simple hardware blocks 
(functional units and registers) and interconnections (multiplexors and wires) 
as possible, in order to reduce its cost, area, and power consumption. Thus 
hardware blocks and interconnections should be reused across the application 
as much as possible. Resource sharing has also crucial impact in reducing the 
system reconfiguration overhead, both in time and space. 

To design such a reconfigurable datapath, one must represent each selected 
piece of the application as a control/data-flow graph (CDFG) and merge them 
together, synthesizing a single reconfigurable datapath. The control/data-flow 
graph merging process enables the reuse of hardware blocks and interconnec- 
tions by identifying similarities among the CDFGs, and produces a single data- 
path that can be dynamically reconfigured to work for each GDFG. Ideally, the 
resulting datapath should have the minimum area cost. Ultimately, this corre- 
sponds to minimize the amount of hardware blocks and interconnections in the 
reconfigurable datapath. The datapath merging problem (DPM) seeks such an 
optimal merging and is known to be in AfP-hard [ 10 ]. 

To minimize the area cost one has to minimize the total area required by both 
hardware blocks and interconnections in the reconfigurable datapath. However, 
since the area occupied by hardware blocks is typically much larger than that 
occupied by the interconnections, the engineers are only interested in solutions 
that use as few hardware blocks as possible. Glearly, the minimum quantity of 
blocks required for each type of hardware block is given by the maximum number 
of such block that is needed among all GDFGs passed at the input. The minimum 
amount of hardware blocks in the reconfigurable datapath can be computed 
as the sum of these individual minima. As a consequence, DPM reduces to 
the problem of finding the minimum number of interconnections necessary to 
implement the reconfigurable datapath. 

Fig. 1 illustrates the concept of control/data-flow graph merging and the 
problem we are tackling. For simplicity, the multiplexors, who select the inputs 
for certain functional blocks, are not represented. The graphs G' and G repre- 
sent two mappings of the GDFGs Gi and G2. In both these mappings, vertices 
oi and 05 from Gi are mapped onto vertices 61 and 63 from G2, respectively, 
while vertex 04 of Gi has no counterpart in G2. The difference between the two 
mappings is that, in G' vertex &2 of G2 is mapped onto vertex 02 of Gi, while 
it is mapped onto 03 in G. The mappings G' and G are both feasible since they 
only match hardware blocks that are logically equivalent. Though their recon- 
figurable datapaths have the same amount of hardware blocks, in G' no arcs are 
overlapped while in G the arcs (03, 05) and (62, ^3) coincide (see the highlighted 
arc in Fig. 1 ). In practical terms, this means that one less multiplexor is needed 
and, therefore, G is a better solution for DPM than G'. 
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Fig. 1. Example of a DPM instance. 



In this paper we present an Integer Programming (IP) formulation for DPM 
and introduce some valid inequalities for the convex hull of integer solutions. 
These inequalities form the basis of a branch-and-cut (B&C) algorithm that 
we implemented. The contributions of our work are twofold. First the B&C 
algorithm was able to compute lower bounds for a set of DPM instances, allowing 
us to assess the performance of the heuristic proposed by Moreano et al. [1], one 
of the best suboptimal algorithms available for DPM. Secondly, the B&:C also 
proved to be a valuable tool to solve small-sized DPM instances to optimality. 

The paper is organized as follows. The next section gives a formal descrip- 
tion of DPM in terms of Graph Theory. Section 3 briefly discusses Moreano’s 
heuristic. Section 4 presents an IP formulation for DPM, together with some 
classes of valid inequalities that can be used to tighten the original model. In 
Sect. 5 we report our computational experiments with the B&C algorithm and 
analyze the performance of Moreano’s heuristic. Finally, in Sect. 6 we draw some 
conclusions and point out to future investigations. 



2 A Graph Model for DPM 

In this section we formulate DPM as a graph optimization problem. The input 
is assumed to be composed of n datapaths corresponding to application loops 
of a computer program. The goal is to find a merging of those datapaths into a 
reconfigurable one that is able to work as each individual loop datapath alone and 
has as least hardware blocks (functional units and registers) and interconnections 
as possible. That is, the reconfigurable datapath must be capable of performing 
the computation of each loop, multiplexed in time. 

The t-th datapath is modeled as a directed graph Gi = (Vi,Ei), where the 
vertices in Vi represent the hardware blocks in the datapath, and the arcs in Ei 
are associated to the interconnections between the hardware blocks. The types of 
hardware blocks (e.g. adders, multipliers, registers, etc) are modeled through a 
labeling function tt^ : Vi T, where T is the set of labels representing hardware 
block types. For each vertex u £ Vi, 7Ti{u) is the type of the hardware block 
associated to u. A reconfigurable datapath representing a solution of DPM can 
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also be modeled as a directed graph G = {V, E) together with a labeling function 
7T : y — >■ T. In the final graph G, given i G {1, . . . , n}, there exists a mapping 
which associates every vertex of Vi to a distinct vertex in V . This mapping is 
such that, if V € Vi, u € V and Hi{v) = u, then TTi{v) = tt{u). Moreover, whenever 
the arc (v,v') is in Ei, the arc iii{v')) must be in E. If G is an optimal 

solution for DPM it satisfies two conditions: (a) for all T G T, the number of 
vertices of G with label T is equal to the maximum number of vertices with that 
label encountered across all datapaths Gi] and (b) \E\ is minimum. Condition 
(a) forces the usage of as few hardware blocks as possible in the reconfigurable 
datapath. As cited before, this is a requirement of the practitioners. 

3 Moreano’s Heuristic for DPM 

Since DPM is AfP-hard, it is natural to devise suboptimal algorithms that can 
solve it fast, preferably in polynomial time. In Moreano et al. [1], the authors 
proposed a heuristic for DPM and give comparative results showing that it out- 
performs other heuristics presented in the literature. Moreano’s heuristic (MH) 
is briefly described in this section. In Sect. 5, rather than assess the efficiency of 
MH using upper bounds generated with other methods, we compare its solutions 
with strong lower bounds computed via the IP model discussed in Sect. 4. 

For an integer fc > 1, define fc-DPM as the DPM problem whose input is 
made of k loop datapaths. Thus, the original DPM problem would be denoted 
by n-DPM but the former notation is kept for simplicity. MH is based on an 
algorithm for 2-DPM, here denoted by 2DPMalg, that is presented below. 

Let Gi = {Vi,Ei) and G 2 = {Vi,Ei) be the input graphs and tti and 7T2 their 
respective labeling functions. A pair of arcs {(u, v), {w, z)} in Ei x E 2 is said to 
form a feasible mapping if = 7T2(w) and 7Ti(r!) = 712 ( 2 ). The first step of 

2DPMalg constructs the eompatibility graph EI = (IF, E) of Gi and G 2 .The graph 
H is undirected. The vertices in IF are in one-to-one correspondence with the 
pairs of arcs in Ei x E 2 which form feasible mappings. Given two vertices a and b 
in IF represented by the corresponding feasible mappings, say a = {(u, f), {w, z)} 
and b = {{u',v'), {w',z')}, the edge (a,b) is in E except if one of the following 
conditions hold: (i) u = u' and w ^ w' or (ii) v = v' and z ^ z' or (Hi) u ^ u' and 
w = w' or (iv) V ^ v' and 2 = 2 '. If the edge (a, b) is in F, the feasible mappings 
that they represent are compatible, explaining why H is called the compatibility 
graph. Now, as explained in [1], an optimal solution for 2-DPM can be computed 
by solving the maximum clique problem on H . The solution of DPM is easily 
derived from an optimal clique of H since the feasible mappings associated to 
the vertices of this graph provide the proper matchings of the vertices of Gi and 
G 2 . However, it is well-known that the clique problem is AfP-hard. Thus, the 
approach used in MH is to apply a good heuristic available for cliques to solve 
2-DPM. Later in Sect. 5, we discuss how this is done in practice. 

Before we continue, let us give an example of the ideas discussed in the 
preceding paragraph. To this end, consider the graphs Gi and G 2 in Fig. 2 rep- 
resenting an instance of 2-DPM. According to the notation used in this figure. 
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Fig. 2. Example of a 2-DPM instance. 



each vertex u in a graph Gi is identified with a label Tij , which denotes that u is 
the j-th vertex of Gi and TTi{u) = T. For instance, A12 is the second vertex of G\ 
which have type A. This notation is used to other figures representing DPM in- 
stances and solutions throughout. Figure 3 depicts the compatibility graph H of 
Gi and G2. Consider, for example, the feasible mappings (^n, i?n), (A21, i?2i) 
(vertex wi in H) and (i?n, Gn), (B21, G21) (vertex in H). For those map- 
pings, no vertex from Gi maps onto two distinct vertices in G2 and vice-versa. 
As a result, these two mapping are compatible, and an edge (wi, W5) is required 
in H . On the other hand, no edge exists in H between vertices V02 and w^. The 
reason is that the mappings represented by these vertices are incompatible, since 
otherwise vertex An in Gi would map onto both A22 and A23 in G2. 

A maximum clique of the compatibility graph H in Fig. 2 is given by vertices 
wi, W4 and IU5. An optimal solution G for 2-DPM can be easily built from this 
clique. The resulting graph G is shown in Fig. 3 and is obtained as follows. First, 
we consider the vertices of the clique. For instance, for vertex wi represents the 
feasible mapping {(An, (A21, B21)}, we add to G two vertices ui and U2 

corresponding respectively to the mapped vertices {An,A2i} and |i?ii,i?2i}. 
Moreover, we also include in G the arc (mi, U2) to represent the feasible mapping 
associated to Wi. Analogous operations are now executed for vertices W4 and w^. 
The former vertex is responsible for the addition of vertices U4 and M5 and of arc 
(u4,tt5) in G while the latter gives rise to the addition of arc (m 2,M4). Finally, 
we add to G the vertex U3 corresponding to the non-mapped vertex A22 from 




Fig. 3. Compatibility graph and an optimal solution for the 2-DPM instance of Fig. 2. 
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G2, and the arcs (mi, M4), (u4, U3) and (M3, M2) corresponding respectively to arcs 
{Aii,Cii) from Gi and arcs (C21, A22) and (^22,-821) from G2. 

Back to MH, we now show how it uses algorithm 2 DPMalg as a building-block 
for getting suboptimal solutions for DPM. MH starts by applying 2 DPMalg to 
graphs Gi and G2 with labeling functions tti and 7T2, respectively. The output 
is a graph G and a labeling function tt. At each iteration i, i £ { 3 , . . . , n}, MH 
applies 2 DPMalg to graphs G and Gi and their functions tt and After all these 
pairwise matchings have been completed, the graph G is returned. 



4 Integer Linear Programming Exact Solution 



A natural question that arises when one solves a hard problem heuristically 
is how far the solutions are from the true optimum. For 2 -DPM, algorithm 
2 DPMalg from Sect. 3 can be turned into an exact method, provided that an exact 
algorithm is used to find maximum cliques. However, this approach only works 
when merging two datapaths. A naive extension of the method to encompass the 
general case requires the solution of hard combinatorial problems on large-sized 
instances which cannot be handled in practice. As an alternative, in this section 
we derive an IP model for DPM. The aim is to compute that model to optimality 
via IP techniques whenever the computational resources available permit. When 
this is not the case, we would like at least to generate good lower bounds that 
allow us to assess the quality of the solutions produced by MH. 

Let us denote by ai the z-th type of hardware block and assume that T has 
m elements, i.e., T = {oi,... ,am}- Moreover, for every i G {I,-- - ,n} and 
every t G {1, . . . , m}, let us define bn as the number of vertices in E associated 
with a hardware block of type at and let q{t) = max{bit : 1 < z < zz}. Then, 
the solutions of DPM are graphs with k vertices, where k = q{t). In the 

remainder of the text, we denote by K and N the sets {!,... , fc} and {!,... , rz}, 
respectively. Besides, we assume that for every hardware block of type at in T 
there exists i £ N and u £ Vi such that TTi{u) = at- 

When V is given by {vi,V2, ■ ■ ■ , Vk}, we can assume without loss of generality 
that 7t(mi) = . . . = 7 r(z;q(i)) = ai, Tr{vq(i)+i) = ... = 7r(z;q(i)+q(2)) = 02 and so 
on. In other words, V is such that the first <7(1) vertices are assigned to label 
ai, the next q{ 2 ) vertices are assigned to label 02 and so on. This assumption 
reduces considerably the symmetry of the IP model increasing its computability. 
Below we use the notation Jt to denote the subset of indices in K for which 
TT{vi) = at (e.g., Ji = {1, . . . , g(l)} and J2 = {g(l) + 1, . . . , g(l) + g(2)}). 

We are now ready to define the binary variables of our model. For every triple 
(z,M, j) with i £ N, u £ Vi and j £ J(7Ti(u)), let Xuij be one if and only if the 
vertex m of V) is mapped onto the vertex vj of V. Moreover, for any pair 
of distinct elements in K , let yjj/ be one if and only if there exists i £ N and an 
arc in Ei such that one of its end-vertices is mapped onto vertex Vj of V while 
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the other end- vertex is mapped onto Vj> . The IP model is then the following. 



min 2 = Vjj' (1) 

^uij — Vjj' <1 Vi € N,'i{u,u) G Ei,yj e J{-Ki{u)), 

yf e J{TVi{u')),j ^ f (2) 

E„6v,|jeJUiW) <1 yi€N,yj gK (3) 

EjsJUiM) = 1 yiG N,yuGVi (4) 

Vjj' G {0, 1} yj,f GK,j^ f (5) 

Xuij G {0, 1} Vi G N,\fu G Vi,yj G J{TVi{u)) (6) 



Equation (1) expresses the fact that an optimal solution to DPM is a graph 
with as few arcs as possible. Constraints (2) force the existence of arcs in the 
output graph. Constraints (3) avoid multiple vertices in one input graph to be 
mapped to a single vertex of the output graph. Finally, (4) guarantees that any 
vertex in any input graph is mapped to exactly one vertex of V. 

Notice that (5) can be replaced by inequalities of the form 0 < yjj/ < 1 for 
all j yf j' with (j, j') G K X K. This is so because the objective function together 
with (2) force the y variables to assume values in the limits of the interval [0, 1] 
and, therefore, to be integer- valued. This remark is important for computational 
purposes. The most successful algorithms implemented in commercial solvers for 
IP are based on branch-and-bound (B&B) algorithms. The size of the solution 
space increases exponentially with the number of integer variables in the model. 
Thus, relaxing the integrality constraints on the y variables in our model, we 
reduce the search space and increase the chances of success of the algorithm. 

The solution of hard combinatorial problems through IP algorithms relies 
largely on the quality of the dual bounds produced by the linear relaxation of 
the model at hand. To improve the dual bounds, the relaxation can be amended 
with additional constraints that are valid for integer solutions of the relaxation 
but not for all the continuous ones. This addition of valid inequalities tightens 
the relaxation for its feasibility set strictly decreases. The new constraints, typ- 
ically chosen from a particular class of valid inequalities, can be either included 
a priori in the model, which is then solved by a standard B&B algorithm, or 
generated on the fly during the enumeration procedure whenever they are vio- 
lated by the solution of the current relaxation. The latter method gives rise to 
B&C algorithms for IP. Quite often the use of B&C is justified by the num- 
ber of potential inequalities that can be added to the model which, even for 
limited classes of valid inequalities, is exponentially large. On the other hand, 
when inequalities are generated on the fly, algorithms that search for violated 
inequalities are needed. These algorithms solve the so-called separation problem 
for classes of valid inequalities and are named separation routines. For a thor- 
ough presentation of the Theory of Valid Inequalities and IP in general, we refer 
to the book by Nemhauser and Wolsey [11]. In the sequel we present two classes 
of valid inequalities that we use to tighten the formulation given in (l)-(6). 




552 



C.C. de Souza et al. 



4.1 The Complete Bipartite Subgraph (CBS) Inequalities 

The idea is to strengthen (2) using (3). This is done through special subgraphs of 
the input graphs. Given a directed graph D, we call a subgraph H = {Wi, IT2, F) 
a CBS of D if, for every pair of vertices {wi,t(; 2 } in x W2, (wi,W 2 ) is in F. 
Now, consider an input graph Gi, i € N, of a DPM instance and two distinct 

labels at 0(2 G T. Let be a CBS of Gi such that all 

vertices in have label Ot^ Suppose that is maximal with 

respect to vertex inclusion. Assume that Vj and vji are two vertices in V, the 
vertex set of the resulting graph G, with labels atj and at 2 , respectively. The 
CBS inequality associated to vj and Vj' is 

^ + 'y ' XyijI — Ujjl < 1. (7) 

Theorem 1. (7) is valid for all integer solutions of the system (2)-(6). 

Proof. Due to (3), the first summation in the left-hand side (LHS) of (7) cannot 
exceed one. A similar result holds for the second summation. Thus, if an integer 
solution exists violating (7), both summations in the LHS have to be one. But 
then, there would be a pair of vertices {u, u'} in x such that u is mapped 
onto vertex Vj and u' onto vertex vj'. However, as ^ is a CBS of Gi, 

must be an arc of E, the arc set of the output graph G, i.e., yjj' is one. □ 

Clearly, if {u,u') is not a maximal CBS of Gi, then (2) is dominated by some 
inequality in (7) and, therefore, superfluous. Our belief is that the number of 
CBS inequalities is exponentially large which, in principle, would not recommend 
to add them all to the initial IP model. However, the DPM instances we tested 
reveal that, in practical situations, this amount is actually not too large and the 
CBS inequalities can all be generated via a backtracking algorithm. This allows 
us to test B&B algorithms on models with all these inequalities present. 

4.2 The Partition (PART) Inequalities 

The next inequalities generalize (2). Consider the Fth input graph Gi = {Vi, Ef), 
i G {1, . . • ,n}. Let u and u' be two vertices in Vi with labels a and a', respec- 
tively, with a ^ a' and {u, u') G F(. Again assume that G = {V, E) is the output 
graph and that vj is a vertex of V with label a. Finally, suppose that A and 
B form a partition of the set J{TTi{u')) (see definition in Sect. 4). The PART 
inequality corresponding to u, u', Vj, A and B is 

~ ^ ' Vjj' ~ ^ ' Xu'ij' F 0 ( 8 ) 

i'eA j'&B 



Theorem 2. (8) is valid for all integer solutions of the system (2)-(6). 
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Proof. If u' is mapped onto a vertex Vj> of the resulting graph G and j' is in 
B, (8) reduces to x^ij — ViP — ^ which is obviously true since x^ij < 1 

and yjji > 0 for all j' G A. On the other hand, if u' is mapped onto a vertex Vj> 
of G with j' in A, the last summation in (8) is null and the inequality becomes 
^uij If vertex u is not mapped onto vertex Vj, the latter 

inequality is trivially satisfied. If not, then necessarily there must be an arc in 
G joining vertex Vj to some vertex Vj> of G with f in A. This implies that the 
second summation in (8) is at least one and, therefore, the inequality holds. □ 

Notice that using (4) we can rewrite (8) as < 1 

which, for A = {j'}, is nothing but (2). Moreover, since the size of in 

the worst case, is linear in the total number of vertices of all input graphs, there 
can be exponentially many PART inequalities. However, the separation problem 
for these inequalities can be solved in polynomial time. This is the ideal situation 
for, according to the celebrated Grdtschel-Lovasz-Schrijver theorem [12], the dual 
bound of the relaxation of (2)-(4) and all inequalities in (8) is computable in 
polynomial time using the latter inequalities as cutting-planes. A pseudo-code 
for the separation routine of (8) is shown in Fig. 4 and is now explained. 

Given an input graph Gi, i € N, consider two vertices u and u' such that 
{u, u') is in Gi and a vertex Vj of G whose label is identical to that of u. Now, let 
{x* , y*) be an optimal solution of a linear relaxation during the B&C algorithm. 
The goal is to find the partition of the set J{TTi{u')) that maximizes the LHS of 
(8). It can be easily verified that, with respect to the point (x*, y*) and the input 
parameters i, u, u' and j, the choice made in line 4 ensures that the LHS of (8) is 
maximized. Thus, if the value of LHS computed for the partition returned in line 
7 is non positive, no constraint of the form (8) is violated, otherwise, {A, B) is 
the partition that produces the most violated PART inequality for (x*,y*). This 
routine is executed for all possible sets of input parameters. The number of such 
sets can be easily shown to be polynomial in the size of the input. Moreover, since 
the complexity of the routine is 0{J{TTi{u'))) which, in turn, is Hi® 

identification of all violated PART inequalities can be done in polynomial time. 



procedure part-separation-routine(a:*, y* , i, u, u' ,j); 


1 


A ^ 0; 




2 


B < — 0; 




3 


for all j' G J {-Ki (u‘ 


')) do{ 


4 


if 


,) then A Au {j'}; 


5 


else B B U 


{/}; 


6 


} 




7 


return {A, B); 




end part-separation-: 


routine. 



Fig. 4. Separation routine for PART inequalities. 
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5 Computational Experiments 

We now report on our computational tests with a set of benchmark instances 
generated from real applications from the MediaBench suite [13]. All programs 
were implemented in C++ and executed on a DEC machine equipped with an 
ALPHA processor of 675 MHz, 4 GB of RAM and running under a native Unix 
operating system. The linear programming solver used was CPLEX 7 . 0 and sepa- 
ration routines were coded as callback functions from the solver’s callable library. 

The program implementing heuristic MH resorts to the algorithm of Battiti 
and Protasi [14] to find solutions to the clique problem. The author’s code, 
that was used in our implementation, can be downloaded from [15] and allows 
the setting of some parameters. Among them, the most relevant to us is the 
maximum computation time. Our tests reveal that running the code with this 
parameter set to one second produce the same results as if we had fixed it to 10 
or 15 seconds. Unless otherwise specified, all results exhibited here were obtained 
for a maximum of computation time of one second. This means that, the MH 
heuristic as a whole had just a couple of seconds to seek a good solution. 

The B&B and B&C codes that compute the IP models also had their com- 
putation times limited. In this case, the upper bound was set to 3600 seconds. 
B&B refers to the basic algorithm implemented in CPLEX having the system 
(l)-(6) as input. The results of B&B are identified by the “P” extension in the 
instance names. The B&C algorithms are based on a naive implementation. The 
only inequalities we generated on the fly are the PART inequalities. The separa- 
tion routine from Fig. 4 is ran until it is unable to encounter an inequality that 
is violated by the solution of the current relaxation. So, new PART constraints 
are generated exhaustively, i.e., no attempt is done to prevent the well-known 
stalling effects observed in cutting plane algorithms. A simple rounding heuris- 
tic is also used to look for good primal bounds. The heuristic is executed at 
every node of the enumeration tree. The two versions of B&C differ only in 
the input model which may or may not include the set of CBS inequalities. As 
mentioned earlier, when used, the CBS inequalities are all generated a priori by a 
simple backtracking algorithm. The first (second) version B&C algorithm uses 
the system (l)-(6) (amended with CBS inequalities) as input and its results are 
identified by the “HC” (“HCS”) extension in the instance names. It should be 
noticed that, both in B&:B and in B&C algorithms, the generation of standard 
valid inequalities provided by the solver is allowed. If fact, Gomory cuts were 
added by CPLEX in all cases but had almost no impact on the dual bounds. 

Table 1 summarizes the characteristics of the instances in our data set. 
Golumns “fc” and “A” refer respectively to the number of vertices and labels 
of the output graph G. Golumns “G/’, i G {1,.-. ,4} display the features of 
each input graph of the instance. For each input graph, the columns “kf\ “e,” 
and “£i” denote the number of vertices, arcs and different labels respectively. 

Table 2 exhibits the results we obtained. The first column contains the in- 
stance name followed by the extension specifying the algorithm to which the 
data in the row correspond. The second column reports the CPU time in seconds. 
We do not report on the specific time spent on generating cuts since it is negli- 
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Table 1. Characteristics of the instances. 



Instance name 


k 


e 


1 G'l 1 


G2 I 


1 ^3 1 


1 *^4 1 




Eci 


fei 


ei 


li 


k2 


62 


£2 


k 3 


63 


£3 


ki 


64 


£i 


adpcm 


108 


20 


97 


138 


20 


78 


103 


18 


- 


- 


- 


- 


- 


- 


175 


241 


epic_decode 


24 


6 


16 


17 


3 


16 


15 


4 


15 


13 


4 


12 


11 


5 


59 


56 


epic_encode 


39 


8 


36 


43 


7 


16 


17 


3 


13 


13 


5 


11 


10 


4 


76 


83 


g721 


57 


6 


57 


66 


6 


19 


18 


4 


- 


- 


- 


- 


- 


- 


76 


84 


gsm_decode 


92 


8 


91 


102 


8 


65 


69 


8 


20 


20 


5 


19 


18 


4 


195 


209 


gsm_encode 


48 


8 


46 


57 


7 


41 


48 


7 


20 


21 


5 


19 


17 


6 


126 


143 


jpeg_decode 


104 


5 


101 


111 


5 


61 


68 


5 


- 


- 


- 


- 


- 


- 


162 


179 


jpeg_encode 


47 


7 


46 


55 


7 


31 


32 


5 


28 


32 


6 


- 


- 


- 


105 


119 


mpeg2_decode 


34 


6 


32 


31 


4 


24 


29 


6 


15 


15 


4 


11 


10 


4 


82 


85 


mpeg2_encode 


32 


7 


31 


39 


6 


30 


37 


6 


20 


22 


5 


18 


16 


5 


99 


114 


pegwit 


41 


7 


40 


46 


7 


27 


28 


6 


27 


30 


5 


- 


- 


- 


94 


104 



gible compared to that of the enumeration procedure. Third and fourth columns 
contain respectively the dual and primal bounds when the algorithm stopped. 
Column “MH” displays the value of the solution obtained by Moreano’s heuris- 
tic. To calculate these solutions, MH spent no more than 15 seconds in each 
problem and mpeg2_encode was the only instance in which the clique procedure 
was allowed to run for more than 10 seconds. Column “gap” gives the percent- 
age gap between the value in column “MH” and that of column “DB” rounded 
up. Finally, the last two columns show respectively the total numbers of nodes 
explored in the enumeration tree and of PART inequalities added to the model. 
For each instance, the largest dual bound and the smallest gap is indicated in 
bold. Ties are broken by the smallest CPU time. 

By inspecting Tab. 2, one can see that MH produces very good solutions. It 
solved 4 out of the 11 instances optimally. We took into account here that further 
testing with the IP codes proved that 60 is indeed the optimal value of instance 
pegwit. When a gap existed between MH’s solution and the best dual bound, 
it always remained below 10%. Additional runs with larger instances showed 
that the gaps tend to increase, though they never exceeded 30%. However, this 
is more likely due to the steep decrease in performance of the IP codes than 
to MH. Instance epic_decode was the only case where an optimal solution was 
found that did not coincided with that generated by MH. Nevertheless, the gap 
observed in this problem can be considered quite small: 3.39%. 

Comparing the gaps computed by the alternative IP codes, we see that the 
two B&C codes outperform the pure B&B code. Only in two instances the 
pure B&B code beat both code generation codes. The strength of inequalities 
PART and CBS can be assessed by checking the number of nodes explored during 
the enumeration. This number is drastically reduced when cuts are added as 
it can be observed, for instance, for problems adpcm and epic_decode where 
the use of cuts allowed the computation of the optimum at the root node while 
B&B explored thousands or hundreds of nodes. Instance g721 was also solved 
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Table 2 . Computational results. 



Instance. extension 


time 


DB 


PB 


MH 


gap 


#nodes 


#cuts 


adpcm.P 


3604 


165.50 


170 


168 


1.20 


9027 


0 


adpcm.HC 


37 


168.00 


168 


168 


0.00 


1 


813 


adpcm.HCS 


78 


168.00 


168 


168 


0.00 


1 


848 


epic-decode.P 


5 


33.00 


33 


33 


0.00 


143 


0 


epic_decode.HC 


0 


33.00 


33 


33 


0.00 


1 


90 


epic_decode.HCS 


0 


33.00 


33 


33 


0.00 


1 


74 


epic_encode.P 


2221 


59.00 


59 


61 


3.39 


299908 


0 


epic_encode.HC 


3603 


57.00 


60 


61 


7.02 


1055 


4108 


epic_encode.HCS 


3082 


59.00 


59 


61 


3.39 


1197 


4016 


g721_.P 


460 


70.00 


70 


70 


0.00 


50702 


0 


g721_.HC 


880 


70.00 


70 


70 


0.00 


371 


1673 


g721_.HCS 


112 


70.00 


70 


70 


0.00 


116 


1259 


gsm_decode.P 


3616 


105.75 


- 


120 


13.21 


1601 


0 


gsm_decode.HC 


3614 


112.09 


- 


120 


6.19 


0 


3655 


gsm_decode.HCS 


3607 


110.25 


- 


120 


8.11 


0 


2002 


gsm_encode.P 


3604 


67.12 


80 


72 


5.88 


23346 


0 


gsm_encode.HC 


3606 


68.24 


73 


72 


4.35 


97 


6126 


gsm_encode.HCS 


3604 


68.16 


79 


72 


4.35 


23 


6094 


jpeg_decode.P 


3606 


117.70 


163 


137 


16.10 


10651 


0 


jpeg_decode.HC 


3620 


126.20 


- 


137 


7.87 


1 


4569 


jpeg-decode. HCS 


3623 


125.10 


- 


137 


8.73 


1 


3669 


jpeg_encode.P 


3608 


61.00 


83 


71 


16.39 


63302 


0 


jpeg_encode.HC 


3604 


62.92 


- 


71 


12.70 


1 


5419 


jpeg_encode.HCS 


3604 


64.10 


- 


71 


9.23 


1 


5620 


mpeg2_decode.P 


3613 


47.03 


51 


51 


6.25 


220173 


0 


mpeg2-decode.HC 


3603 


46.09 


51 


51 


8.51 


310 


4657 


mpeg2_decode.HCS 


3605 


47.06 


52 


51 


6.25 


455 


4555 


mpeg2_encode.P 


3608 


46.50 


60 


53 


12.77 


86029 


0 


mpeg2_encode.HC 


3603 


47.59 


55 


53 


10.42 


68 


7420 


mpeg2_encode.HCS 


3603 


48.67 


55 


53 


8.16 


115 


7778 


pegwit.P 


3607 


58.22 


60 


60 


1.69 


81959 


0 


pegwit.HC 


3604 


55.69 


62 


60 


7.14 


289 


6920 


pegwit.HCS 


3604 


57.77 


61 


60 


3.45 


202 


5843 



to optimality by the B&C codes with much fewer nodes than B&B, however, 
when the CBS inequalities were not added a priori, this gain did not translate 
into an equivalent reduction in computation time. In the remaining cases, where 
optimality could not be proved, again we observed that B&C codes computed 
better dual bounds whereas the number of nodes visited were orders of magnitude 
smaller than that of B&B. 
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6 Conclusions and Future Research 

In this paper we presented an IP formulation for DPM and introduced valid 
inequalities to tighten this model. Based on this study, we implemented B&C 
and B&B algorithms to assess the performance of Moreano’s heuristic (MH) for 
DPM, which is reported as being one of the best available for the problem. Our 
computational results showed that MH is indeed very effective since it obtains 
high-quality solutions in a matter of just a few seconds of computation. 

The cut generation codes also proved to be a valuable tool to solve some 
instances to optimality. However, better and less naive implementations are pos- 
sible that may turn them more attractive. These improvements are likely to 
be achieved, at least in part, by adding tuning mechanisms that allow for a 
better trade off between cut generation and branching. For instance, in prob- 
lems gsm_decode, jpeg_decode and jpeg_encode (see Tab. 2), the B&C codes 
seemed to get stuck in cut generation since they spent the whole computation 
time and were still at the root node. Other evidences of the need of such tun- 
ing mechanisms are given by instances pegwit and g721 were the pure B&B 
algorithm was faster than at least one of the B&C codes. 

Of course, a possible direction of research would be to perform further poly- 
hedral investigations since they could give rise to new strong valid inequalities 
for the IP model possibly resulting into better B&C codes. Another interest- 
ing investigation would be to find what actually makes a DPM instance into a 
hard one. To this end, we tried to evaluate which of the parameters displayed in 
Tab. 1 seemed to affect most the computation time of the IP codes. However, 
our studies were inconclusive. Probably, the structures of the input graphs play 
a more important role than the statistics that we considered here. 
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Abstract. Energy has emerged as a critical constraint in mobile com- 
puting because the power availability in most of these systems is limited 
by the battery power of the device. In this paper, we focus on the memory 
energy dissipation. This is motivated by the fact that, for data intensive 
applications, a significant amount of energy is dissipated in the mem- 
ory. Advanced memory architectures like the Mobile SDRAM and the 
RDRAM support multiple power states of memory banks, which can be 
exploited to reduce energy dissipation in the system. Therefore, it is im- 
portant to design efficient controller policies that transition among power 
states. Since the addressed memory chip must be in the active state in 
order to perform a read/ write operation, the key point is the tradeoff 
between the energy reduction due to the use of low power modes and 
the energy overheads of the resulting activations. The lack of rigorous 
models for energy analysis is the main motivation of this work. Assuming 
regular transitions, we derive a formal model that captures the relation 
between the energy complexity and the memory activities. Given a prede- 
termined number of activations, we approximate the optimal repartition 
among available power modes. We evaluate our model on the RDRAM 
and analyze the behavior of each parameter together with the energy 
that can be saved or lost. 



1 Introduction 

Due to the growing popularity of embedded systems [13,14,15], energy has 
emerged as a new optimization metric for system design. As the power avail- 
ability in most of these systems is limited by the battery power of the device, 
it is critical to reduce energy dissipation in these systems to maximize their op- 
eration cycle. Power limitation is also motivated by heat or noise limitations, 
depending on the target application. 

The topic of energy reduction has been intensively studied in the literature 
and is being investigated at all levels of system abstraction, from the physi- 
cal layout to software design. There have been several contributions on energy 
saving focused on scheduling/processors [6,7,8], data organizations [9,1], com- 
pilation [17,18,19,24], and the algorithmic level [21,22,24]. The research at the 
architecture level has led to new and advanced low energy architectures, like 
the Mobile SDRAM and the RDRAM, that support several low power features 
such as multiple power states of memory banks with dynamic transitions [11, 
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12], row/column specific activation, partial array refresh, and dynamic volt- 
age/frequency scaling [20]. 

It is well known that the most important part of energy dissipation comes 
from memory activities [1,2], sometimes more that 90% [12]. Consequently, the 
topic of memory energy reduction is now into the spotlight. For the purpose 
of reducing the energy dissipation, contributions on cache memory optimization 
can be considered because of the resulting reduction of memory accesses [3,4, 
5,22]. In order to benefit from the availability of different memory operating 
modes, effective memory controller policies should suit the tradeoff between the 
energy reduction obtained from the use of low power modes and the energy 
overhead of the consequent activations {exit latency and synchronization time) 
[25]. A combinatorial scheduling technique is proposed by Tadonki et al [23]. 
A threshold approach is considered by Fan et al. [25] in order to detect the 
appropriate instant for transitions into low power modes. A hardware-assisted 
approach for the detection and estimatation of idleness in order to perform power 
mode transitions is studied by Delaluz et al [12]. 

The goal of this paper is to design and evaluate a formal model for the energy 
minimization problem. This is important as a first step toward a design of an 
efficient power management policy. Our model clearly shows the relative impact 
of the storage cost and the activation overheads. The optimization problem de- 
rived from our model is a quadratic programming problem, that is well solved 
by standard routines. We consider only the transitions from low power modes 
to the active mode, thus in the paper, we say activation instead of transition. 
Given a predetermined amount of activations to be performed, our model gives 
the optimal assignment among power modes and the corresponding fraction of 
time that should be spent in each mode. It is clear that there is a correlation 
between the number of activations and the time we are allowed to spent in each 
mode. It is important to assume that the time we spend in a low power mode af- 
ter a transition is bounded. Otherwise, we should transition to the lowest power 
mode and stay in that mode until the end of the computation. This is unrealistic 
in general because memory accesses will occur, very often at an unpredictable 
time. To capture this aspect, we consider a time slot for each power mode. Each 
transition to a given power mode implies that we will spent a period of time 
that is in a fix range (parameterizable). Once the parameters have been fixed, 
the resulting optimal energy becomes a function of the number of activations, 
which should be in a certain range in order to impact an energy reduction. 

The rest of the paper is organized as follows. Section II presents our model 
for energy evaluation. In Section III, we formulate the optimization problem 
behind the energy minimization. An evaluation with the RDRAM is presented 
in section IV. We conclude in Section V. 

2 A Model of Energy Evaluation 

We assume that the energy spent for running an algorithm depends on three 
major types of operation: 
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— the operations performed by the processor (arithmetic and logical operations, 
comparisons, etc.); 

— the operations performed on the memory (read/write operations, storage, 
and state transition); 

— the data transfers at all levels of hardware system. 

In this paper, we will focus only on the energy consumed by memory operations. 
We consider the memory energy model defined in [21], which we restate here. 
The memory energy E(n) for problem size n is defined as the sum of the memory 
access energy, the data storage energy, and state transition overheads. This yields 
the formula 



E{n) = Ka X C{n) + x S{n) x A{n) + Kp x P{n), (1) 



where 

— Ka is the access energy cost per unit of data, and C{n) represents the total 
number of memory accesses 

— Kg is the storage energy cost per unit of data per unit time, S{n) is the 
space complexity, and A(n) is the total time for which the memory is active 

— Kp is the energy overheads for each power transition, and P{n) represents 
the total number of state transition. 

As we can see, the model consider two memory state {active and inactive), 
and a single memory bank. Moreover, the storage cost in intermediate modes 
is neglected, otherwise we should have considered T{n) (the total computation 
time) instead of A{n) (the total active time). In our paper, we consider the 
general case with any given number of memory states, and several memory banks 
with an independent power control. 

The main memory M is composed of p banks, and each bank has q possible 
inactive states. We denote the whole set of states by 5 = {0, 1, 2, • • • , q}, where 0 
stands for the active state. For state transition, we consider only the activations 
(transition from a low power mode to the active node). This is justified by the 
fact that transitions to low power modes impact a negligible energy dissipation. 
The activation energy overheads is given by the vector W = {wq,Wi, - ■ ■ ,Wq), 
wq = 0. During the execution of an algorithm, a given bank i spends a fraction 
aij of the whole time in state j, thus we have 

<? 

Oij = 1. (2) 

About the storage cost, let Q = {qj),j = 0, •••,<; denotes the vector of 
storage cost, means qj is the storage cost per unit data and per unit time when 
the memory is in power state j. 

Concerning the activation complexity, note that since activations occur in a 
sequential processing, and the transition cost does not depend on the memory 
bank, we only need to consider the number of activations from each state j, we 
denote Xj. We then define the activation vector x = (xg, xi, ■ ■ ■ , Xg). 
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If we assume that memory banks are of same volume a, we obtain the fol- 
lowing memory energy formula for problem size n 

p q q 

E{n) =KaX C{n) + T{n) x ^ aijQj) X a + (3) 

i=l j=0 j=0 

We define the vector y = {yo, j/i, • • • , yq) by 

p 

yj = ^oiir (4) 

i=l 

For a given state j, yj is the accumulation of the fractions of time each memory 
bank has spent in mode j. In case of a single memory bank, it is the fraction of 
the total execution time spent in the considered mode. The reader can easily see 
that 

<? 

Vi = P- 

i=i 

We shall consider the following straightforward equality 
p q q p 

^ijij — — vq ■ 

i—l j—0 j—0 i—1 

We define the vector H = {Hq, Hi, - ■ ■ , Hq) as the vector of activation delays, 
Hj is the time overhead induced by an activation from state j {Hq = 0). 

The total time T{n) is composed of 

— the cpu time T(n) 

— the memory accesses time 5C{n) (<5 is the single memory access delay) 

— the activations overhead Hx"’" 

We can write 

E{n) = Ka X C + a X {t + SC + Hx'^) x yQ"'- -|- xW'^ . (6) 

We make the following considerations 

— the power management energy overhead is negligible [21]. 

— the the additive part Ka x C{n) can be dropped since it doesn’t depend on 
the power state management. 

Thus, the objective to be minimized is (proportional to) the following 



E{x, y) = [Hx'^ -k (t -k 5C)\yQ^. 



( 7 ) 
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3 Optimization 

3.1 Problem Formulation 

Our goal is to study the energy reduction through the minimization of the 
objective (7). In order to be consistent and also avoid useless (or trivial) 
solutions, a number of constraints should be considered: 

Domain specification. The variables x and y belong to Af and TZ respectively, 

i.e. 



a: G N«, (8) 

y G (9) 

Time consistency. As previously explained, we have 

2 /> 0 , ( 10 ) 

2/1+ 2/2 H \-yq=P, (11) 



Another constraint that should be considered here is related to the fraction of 
time spent in the active mode (yo)- Indeed, the time spent in the active mode 
is greater than the total memory access time, which can be estimated from the 
number of memory accesses C, and the time of a single access S. Since, we 
consider fraction of time, we have 



2/0 > 



SC 

Iz’ 



(12) 



where SC is the total memory access time, and R the total running time 
(without the power management overhead) which can be estimated from the 
time complexity of the program or from a profiling. 



Activations bounds. It is reasonable to assume that each time a memory bank 
is activated, it will earlier or later be accessed. Thus, we have 

9 

J2x,<C. (13) 

i=0 

However, except the ideal case of a highly regular and predictable memory ac- 
cess, several activations should be performed for a better use of power modes 
availability. This is well captured by a lower bound the number of activations. 
Thus, we have a lower bound and an upper bound in the number of activation. 
In our model we consider a fix amount of activations instead of a range. This 
gives, 

Xi + X2 + ■ ■ ■ + 3 ;^ = pC, 

where p is a scaling factor such that 0 < p < 1. 



( 14 ) 
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Compatibility between time and activation. Recall that a memory bank is 
activated if and only if it will be accessed. Moreover, when a memory bank is put 
in a given low power mode, a minimum (resp. maximum) period of time is spent 
in that mode before transitioning to the active mode. This can be the fraction 
of time taken by the smallest job (or instruction depending on the granularity). 
We consider the set of time intervals [ipi,rji\ low power modes. Then, we have 

ipjXj < yj < r]jXj for j = 1, • • • , q. (15) 

In addition, since any of every activation implies a minimum period of time, we 
denote 7 , in the active mode, we also have 

<? 

j=o 

Using relation (14), relation (16) becomes 

yo > ipC. (17) 

We shall consider y define by 

M = max{-, 7 p}. (18) 

The inequalities (12) and (17) can be combined to 

yo > pC. (19) 



We now analyze the model. 

3.2 Model Analysis 

We first note that transitioning from the active state to state j for a period of 
time At is advantageous (based of storage cost) if and only if we have 



qj{At + hj) < qoAt, (20) 

which gives the following threshold relation 

At>{—^ — )h.. (21) 

<?o - % 

The time threshold vector D defined by 

74 , = (^^)/ i„j = 1 , 2,...,(7 ( 22 ) 

Qo - Qj 

provides the minimum period of time that should be spent in each low power 
modes, and is also a good indicator to appreciate their relative impact. We 
propose to select the time intervals (15) for low power modes as follows 
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‘Pj = 



‘Pj = A2 — , 



(23) 



where 1 < Ai < A 2 . 

Lastly, the active time threshold as defined in (17) should be greater than the 
memory accesses time. Then we should have 



7 > 



pR 



(24) 



We now solve the optimization problem provides by our model as described 
above. 



3.3 Solving the Optimization Problem 

According to our model, the optimization problem behind the energy reduction 
is the following 



min 


xH'^Qy'^ + RQy'^ 




subject to 






1. 


X e 






2. 








3. 


yi+y2 + -- 


' + yq 


= P, 


4. 


Vo > pC, 






5. 


X1+X2 + ■ 




= pC, 


6. 


y < 






7. 


y > fjx. 







Fig. 1. Energy minimization problem 

There are mainly two ways for solving the optimization problem formulated 
in figure 3.3. The first approach is to consider the problem as a mixed inte- 
ger programming problem (MIP). For a given value of x, the resulting model 
becomes a linear programming (LP) problem. Thus, appropriate techniques like 
the standard LP based Branch and Bound can be considered. However, we think 
that this is an unnecessarily challenging computation. Indeed, a single transition 
does not have a significant impact on the overall energy dissipation as quantified 
by our model. Thus, we may consider a pragmatic approach where the variable x 
is first assumed to be continuous, and next rounded down in order to obtain the 
required solution. This second approach yields a simple quadratic programming 
model that is easily solved by standard routines. 

4 Experiments 

We evaluate our model with the values provided in [25] for the RDRAM. Table 2 
summarizes the corresponding values (vector D is calculated using the formula 
( 22 )). 
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q = (300 180 30 3) 

H = (0 16 60 6000) 

p = 8 

q = 4 
(5 = 60 

D = (9.00 6.67 60.61) 



Fig. 2. DRAM parameters 



Our optimization is performed using MATLAB with the following code 



function [X,Y,E] = Energy_0pt(H,Q,R,d,C,p,q,r,g,ll,12) 

7o Matlab code to solve the energy minimization problem 
7o The quadratic objective is considered as follows 
7. 0.5 * X’ * HH * X + ff’ * X 

7o We form our objective coefficients 

HH = [zeros(q, q) , H’ * Q; Q’ * H , zeros(q, q)] ; 

ff = R * [zeros(q, 1); Q’]; 

7o Bound on the main variable Z = [X,Y] 

LB = [zeros(q, 1) ; zeros(q, 1)]; 

UB = [inf * ones(q, 1) ; p * ones(q, 1)]; 

7o Ajust the lower bound on Y1 (YO in the text) 

LB(q+l) = max(d * C / t, g*r*C); 

7o Matrix of the equality constraints 

Aeq = [ones(l,q), zeros (l,q); zeros (l,q) , ones(l,q)]; 
beq = [r * C; p] ; 

7o Matrix of the inequality constraints 

al = [11 * diag(D) , - eye(q)] ; bl = zeros(q, 1); 

a2 = [-12 * diag(D) , eye(q)] ; b2 = zeros(q, 1); 

7o YO is not bounded by X 

aid, :)=[]; bl(l) = [] ; a2(l, :)=[]; b2(l) = [] ; 

7o Forming the matrix 
A = [al; a2]; b = [bl; b2] ; 

7o OPTIMIZATION unsing the solver quadprog of MATLAB 

[Z, E, EXITF, OUTPUT] = quadprog (HH, ff , A, b , Aeq, beq, LB ,UB) ; 

7. RETRIVING X AND Z from Z 
X = Z(l:q); 

Y = Z(q + 1: 2 * q); 
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We consider a problem (abstracted) where 75% of the total time is spent in 
memory accesses. We used R = 80000 and C = 1000. Note that our objective 
function is proportional to the time vector y and the vector of storage coefficient 
Q. Thus, the measuring unit can be scaled as desired without changing the 
optimal argument. Figure 3 displays a selection of optimal activation repartition 
and the percentage of energy that is saved or lost. Figure 4 shows how the energy 
varies in relation with the number of activations. 



p 


Nact 


X 


Y 


Eopt 


Reduction 


0 


0 


(0, 0, 0, 0) 


(8, 0, 0, 0) 


1.92 


0% 


0.01 


10 


(0, 10, 0, 0) 


(7.58, 0, 0.42, 0) 


1.84 


4% 


0.02 


20 


(0, 0, 7, 13) 


(3, 0, 0.31, 4.69) 


1.43 


25% 


0.05 


50 


(0, 0, 41, 9) 


(3, 0, 1.71, 3.29) 


1.30 


32% 


0.10 


100 


(0, 0, 97, 3) 


(3, 0, 4.04, 0.96) 


1.04 


46% 


0.11 


no 


(0, 8, 100, 2) 


(3, 0.1, 4.15, 0.74) 


1.024 


47% 


0.125 


125 


(0, 25, 100,0) 


(3, 0.85, 4.15, 0) 


1.014 


48% 


0.20 


200 


(0, 100, 100, 0) 


(3, 1.3, 3.7, 0)) 


1.08 


44% 


0.21 


210 


(0, 100, 100, 10) 


(3, 1.3, 0.83, 2.87) 


1.72 


-11% 


0.22 


220 


(0, 100, 100, 20) 


(3, 1.3, 0.83, 2.87) 


2.41 


-26% 


0.25 


250 


(0, 100, 100, 25) 


(3, 1.3, 0.83, 2.87) 


2.76 


-44% 



Fig. 3. Experiments with our model on a RDRAM 




Fig. 4. Energy vs the number of activations 
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As we can see from Table 3, the best number of activation is 125 (12.5% of the 
number of memory accesses), with an energy reduction of 48% (taken the always 
active case as baseline). We also see that there is a critical value for the number 
of activations (200 in this case) under which we begin loosing energy. In addition, 
the optimal distribution of activations among low power modes depends on the 
total number of activations and the time we are allowed to stay in each mode. 



5 Conclusion 

We have formulated the problem of energy optimization in the context of several 
low power modes. We have shown that, in order to make a rewarding transition 
to a given low power mode, there is a minimum period of time that should 
be spent in that mode. From our experiments with a RDRAM, it follows that a 
reduction of 48% can be obtained by performing regular transitions. The optimal 
number of activations is determined experimentally. We think that our model 
can be used for a first evaluation of potential energy reduction before moving 
forward to any power management policy. 



References 

1. F. Catthoor, S.Wuytack, E.D. Greef, F. Balasa, L. Nachtergaele, and A. Vande- 
cappelle, Custom memory management methodology - exploration of memory or- 
ganization for embedded multimedia system design, Kluwer Academic Pub., June 
1998. 

2. A. R. Lebeck, X. Fan, H. Zeng, and C. S. Ellis, Power aware page allocation, Int. 
Conf. Arch. Snpport Prog. Lang. Ope. Syst., November 2000. 

3. M. B. Kamble and K. Ghose, Analytical energy dissipation models for low power 
caches, Int. Symp. Low Power Electronics and Design, 1997. 

4. W-T. Shine and G. Ghakrabarti, Memory exploration for low power embedded sys- 
tems, Proc. DAC’99, New Orleans, Louisina, 1999. 

5. C. Sn and A. Despain, Cache design trade-offs for power and performance opti- 
mization: a case study, In Proc. Int. Symp. on Low Power Design, pp. 63-68, 1995. 

6. D. Brooks and M. Martonosi, Dynamically exploiting narrow width operands to 
improve processor power and performance. In Proc. Fifth Inti. Symp. High-Perf. 
Computer Architecture, Orlando, January 1999. 

7. V. Tiwari, S. Malik, A. Wolfe, and T. C. Lee, Instruction Level Power Analysis 
and Optimization of Software, Journal of VLSI Signal Processing Systems, Vol 13, 
No 2, August 1996. 

8. M. C. Toburen, T. M. Conte, and M. Reilly, Instruction scheduling for low power 
dissipation in high performance processors. In Proc. the Power Driven Micro- 
Architecture Workshop in conjunction with ISCA’98, Barcelona, June 1998. 

9. W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, The design and use of 
SimplePower: a eycle- accurate energy estimation tool. In Proc. Design. Automation 
Conference (DAG), Los Angeles, June 5-9, 2000. 

10. Todd Austin, Simplescalar, Master’s thesis. University of Wisconsin, 1998. 

11. 128/144-MBit Direct RDRAM Data Sheet, Rambus Inc., May 1999. 




An Analytical Model for Energy Minimization 569 



12. V. Delaluz and M. Kandemir and N. Vijaykrishnan and A. Sivasubramaniam and 
M. Irwin. Memory energy management using software and hardware directed power 
mode control. Tech. Report CSE-00-004, The Pennsylvania State University, April 
2000 . 

13. W. Wolf. Software-Hardware Codesign of Embedded Systems. In , Proceedings of 
the IEEE , volume 82 , 1998. 

14. R. Ernst. Codesign of Embedded Systems: Status and Trends . In , IEEE Design 
and Test of Computers , volume 15 , 1998. 

15. Manfred Schlett. Trends in Embedded Microprocessors Design. In , IEEE Com- 
puter, 1998. 

16. “Mobile SDRAM Power Saving Features,” Technical Note TN-48-10, MICRON, 
http://www.micron.com 

17. W. Tang, A. V. Veidenbaum, and R. Gupta. Architectural Adaptation for Power 
and Performance. In , International Conference on ASIC, 2001 . 

18. L. Bebini and G. De Micheli. Sytem-Level Optimization: Techniques and Tools. 
In , ACM Transaction on Design Automation of Electronic Systems, 2000. 

19. T. Okuma, T. Ishihara, H. Yasuura . Software Energy Reduction Techniques for 
Variable- Voltage Processors. In , IEEE Design and Test of Computers , 2001 . 

20. J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic Voltage Scaling on a Low- 
Power Microprocessor,” UbiCom-Tech. Report, 2000. 

21. M. Singh and V. K. Prasanna . Algorithmic Techniques for Memory Energy Reduc- 
tion. In , Worshop on Experimental Algorithms, Ascona, Switzerland, May 26-28, 
2003. 

22. S. Sen and S. Chatterjee . Towards a Theory of Cache-Efficient Algorithms . In 
SODA, 2000 . 

23. C. Tadonki, J. Rohm, M. Singh, and V. Prasanna. Combinatorial Techniques for 
Memory Power State Scheduling in Energy Constrained Systems, Workshop on 
Approximation and Online Algorithms (WAOA), WAOA2003, Budapest, Hungary, 
September 2003 . 

24. D.F. Bacon, S.L. Graham, and O.J. sharp . Compiler Transformations for High- 
Performance Computing . Hermes, 1994 . 

25. X. Fan, C. S. Ellis, and A. R. Lebeck. Memory Controller Policies for DRAM 
Power Management. ISLPED’Ol, August 6-7, Huntington Beach, California, 2001. 




A Heuristic for Minimum-Width Graph 
Layering with Consideration of Dummy Nodes 



Alexandre Tarassov^, Nikola S. Nikolov^, and Jiirgen Branke^ 



^ CSIS Department, University of Limerick, Limerick, Ireland, 
{alexandre . tarassov,nikola.nikolov}@ul . ie 
^ Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany. 
brankeOaif b . uni-karlsruhe . de 



Abstract. We propose a new graph layering heuristic which can be used 
for hierarchical graph drawing with the minimum width. Our heuristic 
takes into account the space occupied by both the nodes and the edges of 
a directed acyclic graph and constructs layerings which are narrower that 
layerings constructed by the known layering algorithms. It can be used 
as a part of the Sugiyama method for hierarchical graph drawing. We 
present an extensive parameter study which we performed for designing 
our heuristic as well as for comparing it to other layering algorithms. 



1 Introduction 

The rapid development of Software Engineering in the last few decades has made 
Graph Drawing an important area of research. The Graph Drawing techniques 
find application in visualizing various diagrams, such as call graphs, precedence 
graphs, data-flow diagrams, ER diagrams, etc. In many of those applications it 
is required to draw a set of objects in a hierarchical relationship. Such sets are 
modeled by directed acyclic graphs (DAGs), i.e. directed graphs without directed 
cycles, and usually drawn by placing the graph nodes on parallel horizontal, 
concentric or radial levels with all edges pointing in the same direction. 

There have been recognized a few different methods for hierarchical graph 
drawing. The more recent two are the evolutionary algorithm of Utech et al. [13] 
and the magnetic held model introduced by Sugiyama and Misue [11]. While they 
are an area of fruitful future research, an earlier method, widely known as the 
Sugiyama (or STT) method, has received most of the research attention and has 
become a standard method for hierarchical graph drawing. The STT method is 
a three phase algorithmic framework, originally proposed by Sugiyama, Tagawa, 
and Toda [12], and also based on work by Warfleld [14] and Garpano [3]. At its 
first phase the nodes of a DAG are placed on horizontal levels; at the second 
phase the nodes are ordered within each level; and at the final third phase the 
X— and y— coordinates of all nodes and the eventual edge bends are assigned. 
The STT method can be employed for drawing any directed graph by reversing 
the direction of some edges in advance to ensure that there are no directed cycles 
in the graph and restoring the original direction at the end [6]. 
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In order to assign DAG nodes to horizontal levels at the first phase of the 
STT method it is necessary to partition the node set into subsets such that nodes 
connected by a directed path belong to different subsets. In addition, it must be 
possible to assign integer ranks to the subsets such that for each edge the rank 
of the subset that contains the target of the edge is less than the rank of the 
subset that contains its source. Such an ordered partition of the node set of a 
DAG is known as a layering and the corresponding subsets are called layers. A 
DAG with a layering is called a layered DAG. Figure 1 gives an example of two 
alternative layerings of the same DAG. Algorithms which partition the node set 
of a DAG into layers are known as layering algorithms. 




Fig. 1. Two alternative layerings of the same DAG. Each layer occupies a horizontal 
level marked by a dashed line. All edges point downwards. 



In this paper we propose a new polynomial-time layering algorithm which 
approximately solves the problem of hierarchical graph drawing with the mini- 
mum width. It finds application in the cases when it is necessary to draw a DAG 
in a narrow drawing area and it is the first successful polynomial-time algorithm 
that solves this particular problem. We also present the extensive parameter 
study we performed to design our algorithm. In the next section we formally in- 
troduce the terminology related to DAG layering. Then, in Section 3 we present 
the minimum-width DAG layering problem and the initial rough version of our 
layering heuristic. In Section 4 we specify further our heuristic trough extensive 
parameter study and compare it to other well-known layering algorithms. We 
draw conclusions from this work in Section 5. 



2 Mathematical Preliminaries 

A directed graph G = (V,E) is an ordered pair of a set of nodes V and a set of 
edges E. Each edge e is associated with an ordered pair of nodes (u, w); u is the 
source of e and v is the target of e. We denote this by e = (m, v). We consider only 
directed graphs where different edges are associated with different node pairs. 
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The in-degree d~{v) of node v is the number of edges with a target v , and the 
out-degree d'^{v) of v is the number of edges with a source v. We denote the set 
of all immediate predecessors of node v by Nq{v) , and the set of all immediate 
successors of node v by Nq{v). That is, Nq{v) = {u : {u,v) G E} and Nq{v) = 
{m : {v,u) G E}. The fc-tuple of edges p = ((ui, ^ 2 ), (^ 2 , us), . . . , (rtfc, rtfc+i)) is 
called a directed path from node ui to node Uk+i with length A: > 1. If ui = Uk+i 
then p is a directed cycle. In the rest of this work we consider only directed acyclic 
graphs (DAGs), i.e. directed graphs without directed cycles. 

Let G be a DAG and let C = {Li, . . . , Lh\ be a partition of the node set of G 
into h>\ subsets such that if {u, v) G E with u G Lj and v G Li then i < j. C 
is called a layering of G and the sets Li, . . are called layers. A DAG with 
a layering is called a layered DAG. We assume that in a visual representation of 
a layered DAG all nodes in layer Li are placed on the horizontal level with an 
j/-coordinate i. Thus, we say that Lj is above Li and Li is below Lj Hi < j. 

Let l{u,C) denotes the number of a layer which contains node u G V, i.e. 
l{u,L) = i if and only if u G Li. Then the span of edge e = (u,v) in layering 
£ is defined as s(e,£) = l{u,£) — l{v,£). Glearly, s(e,£) > 1 for each e G E] 
edges with a span greater than 1 are long edges. A layering of G is proper if 
s(e,£) = 1 for each e G E, i.e. if there are no long edges. The layering found by 
a layering algorithm might not be proper because only a small fraction of DAGs 
can be layered properly and also because a proper layering may not satisfy other 
layering requirements. 

In the STT method for drawing DAGs the node ordering algorithms applied 
after the layering phase assume that their input is a DAG with a proper layering. 
Thus, if the layering found at the layering phase is not proper then it must be 
transformed into a proper one. Normally, this is done by introducing so-called 
dummy nodes which subdivide long edges (see Figure 2). 

It is desirable that the number of dummy nodes is as small as possible because 
a large number of dummy nodes significantly slows down the node ordering phase 
of the STT method. There are also aesthetic reasons for keeping the dummy node 
count small. A layered DAG with a small dummy node count would also have a 
small number of undesirable long edges and edge bends. 

A layering algorithm may also be expected to produce a layering with spec- 
ified either width and height, or aspect ratio. The height of a layering is the 
number of layers. Normally the nodes of DAGs from real-life applications have 
text labels and sometimes prespecified shape. We define the width of a node to 
be the width of the rectangle that encloses the node. If the node has no text label 
and no information about its shape or size is available we assume that its width 
is one unit. The width of a layer is usually defined as the sum of the widths of 
all nodes in that layer (including the dummy nodes) and the width of a layering 
is the maximum width of a layer. Usually the width and the height of a layering 
are used to approximate the dimensions of the final drawing. 

The edge density between horizontal levels i and j with i < j is defined as the 
number of edges {u, v) with u G LjULj+iU. . .ULh and v G LgULiU. . .ULi. The 
edge density of a layered DAG is the maximum edge density between adjacent 
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layers (horizontal levels). Naturally, drawings with low edge density are clear 
and easier to comprehend. 

3 Minimum-Width DAG Layering 

Clearly, it is trivial to find a layering of a DAG with the minimum width if the 
width of a layer is considered equal to the sum of the widths of the original DAG 
nodes in that layer. In this case any layering with a single node per layer has 
the minimum width. However, such a definition of width does not approximate 
the width of the final drawing because the space occupied by long edges is not 
insignificant (see Figure 2). The contribution of the long edges to the layering 
width can be taken into account by assigning positive width to the dummy 
nodes and taking them into when computing the layering width. It is sensible 
to assume that the dummy nodes occupy smaller space than the original DAG 
nodes especially in DAGs which come from practical applications and may have 
large node labels. 




Fig. 2. A hierarchical drawing of a DAG. The black circles are the original DAG nodes 
and the smaller white squares are the dummy nodes along long edges. All edges point 
downwards. 



It is NP-hard to find a layering with the minimum width when the contribu- 
tion of the dummy nodes is taken into account [2] . The first attempt to solve this 
problem by a heuristic algorithm belongs to Branke et al. [1]. They proposed a 
polynomial-time heuristic which did not meet their expectations about quality 
when tested with relatively small graphs. To the best of our knowledge the only 
method that can be used for minimum-width DAG layering is the branch-and- 
cut algorithm of Healy and Nikolov which takes as an input an upper bound 
on the width and produces a layering subject to it (if feasible) [8]. Although 
exact, the algorithm of Healy and Nikolov is very complex to implement and its 
running time is exponential in the worst case. 

In this work we design a simple polynomial-time algorithm which finds nar- 
row layerings. We call it MinWidth. Similar to the algorithm of Branke et al. 
it is a heuristic and it does not guarantee the minimum width. Nevertheless it 
produces layerings which are narrower than the layerings produced by any of the 
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known polynomial-time layering algorithms. In the remainder of this section we 
introduce the initial rough version of MinWidth which we tune and extensively 
test in Section 4. 



3.1 The Longest-Path Algorithm 

We base MinWidth on the longest-path algorithm displayed in Algorithm 1. The 
longest-path algorithm constructs layerings with the minimum height equal to 
the number of nodes in the longest directed path. It builds a layering layer by 
layer starting from the bottom layer labeled as layer 1. This is done with the help 
of two node sets U and Z which are empty at start. The value of the variable 
current Jayer is the label of the layer currently being built. As soon as a node 
gets assigned to a layer it is also added to the set U. Thus, U is the set of all 
nodes already assigned to a layer. Z is the set of all nodes assigned to a layer 
below the current layer. A new node v to be assigned to the current layer is 
picked among the nodes which have not been already assigned to a layer, i.e. 
V G V \ U, and which have all their immediate successors assigned to the layers 
below the current one, i.e. Nq{v) C Z. 



Algorithm 1 The Longest-Path Algorithm(G) 
Requires: DAG G = {V, E) 

U ^(j) 

Z 

currentLayer •(— 1 

while U ^ V do 

Select node v G V \ U with Nq (v) C Z 
if V has been selected then 

Assign V to the layer with a number currentLayer 
U -^UDlv} 

end if 

if no node has been selected then 
currentLayer currentLayer + 1 

z zuu 

end if 
end while 



3.2 A Rough Version of MinWidth 

In the following, we will assume that all dummy nodes have the same width, 
Wd, although our considerations can be easily generalized to variable dummy 
node widths. We will also assume that w{v) is the width of node v. We start 
with an initial rough version of MinWidth, displayed in Algorithm 2, which 





A Heuristic for Minimum-Width Graph Layering 575 



Algorithm 2 MinWidth(G) 

Requires: DAG G = {V, E) 

U Z (j> 

currentLayer <— 1; widthCurrent t— 0; widthUp t— 0 

while U V do 

Select node v €V\U with Nq{v) C Z and ConditionSelect 
if V has been selected then 

Assign V to the layer with a number currentLayer 
U -^Uu{v} 

widthCurrent t— widthCurrent — Wd* d'^{v) + w{v) 
Update widthUp 

end if 

if no node has been selected OR ConditionGoUp then 
currentLayer t— currentLayer + 1 

z zuu 

widthCurrent t— widthUp 
Update widthUp 

end if 
end while 



contains a number of unspecified parameters. We specify them later in Section 4 
by extensive parameter study. 

We employ two variables widthCurrent and widthUp which are used to store 
the width of the current layer and the width of the layers above it respectively. 
The width of the current layer, widthCurrent, is calculated as the sum of the 
widths of the nodes already placed in that layer plus the sum of the widths of 
the potential dummy nodes along edges with a source in V \ U and a target in 
Z (one dummy node per edge). The variable widthUp provides an estimation of 
the width of any layer above the current one. It is the sum of the widths of the 
potential dummy nodes along edges with a source in U \ [/ and a target in U 
(one dummy node per edge). 

When we select a node to be placed in a layer we employ an additional 
condition ConditionSelect. Our intention is to specify ConditionSelect so 
that the choice of node v (among alternative candidates) will lead to as narrow 
a layering as possible. We propose to explore the following three alternatives as 
ConditionSelect: 

— Ai'. V is the candidate with the maximum outdegree 

— A2: V is the candidate with the maximum d~^(v) — d~{v); 

— A 3 : V or any immediate predecessor of v has the maximum d~^(v) — d~{v) 
among all candidates and their immediate predecessors. 

In Ai we select the candidate with the maximum indegree because that choice 
will lead to the maximum possible improvement of widthCurrent. A2 and A3 
are less greedy alternatives which do not make the best choice in terms of 
widthCurrent but look also at the effect to the upper layers. By choosing the 
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candidate with the maximum d'^(v) — d~ (v) A2 makes the choice that will bring 
the best improvement to widthUp. The idea behind A3 is to allow nodes which 
can bring big improvement to the width of some upper layer to do it without 
being blocked by their successors with low d~^(v) — d~(v). Thus, A3 represents 
an alternative that tries to choose a node by looking ahead at the impact of that 
choice to the layering width. 

In order to control the width of the layering we introduce a second modifica- 
tion to the longest-path algorithm. That is, we introduce an additional condition 
for moving up to a new layer, ConditionGoUp. The idea is to move to a new 
layer if the width of the current layer or of the layer above it becomes too large. 
In order to be able to check this we introduce the parameter UBW against 
which we would like to compare the width of the current layer. Since widthUp 
represents only an approximation of the width of the layers above the current 
layer we propose to compare its width to c x UBW where c > 1 , i.e. c gives 
freedom to widthUp to be larger than widthCurrent because widthUp is just an 
estimation of the width of the upper layers. We do not consider UBW and c as 
input parameters, we would like to have their values (or narrow value ranges) 
hard-coded in MinWidth instead. We set up ConditionGoUp to be satisfied if 
either: 

— widthCurrent > UBW and d'^{v) < 1 , or 

— widthUp > c X UBW . 

We require d'^{v) < 1 for widthCurrent > UBW to be taken into account 
because the initial value of widthCurrent is determined by the dummy nodes in 
the current layer and it gets smaller (or at least it does not change) when a regular 
node with a positive outdegree gets placed in the current layer. In that case the 
dummy nodes along edges with a source v are removed from the current layer 
and get replaced by v. If d'^{v) > I then the condition widthCurrent > UBW 
on its own is not a reason for moving to the upper layer because there is still 
a chance to add nodes to the current layer which will reduce widthCurrent. If 
d^(t) < I then the assignment of v to the current layer increases widthCurrent 
because it does not replace any dummy nodes. This is an indication that no 
further improvement of widthCurrent can be done. 

In relation to the three alternatives, Ai, A2, and A3, we consider two alter- 
native modes of updating the value of widthUp: 

— Set widthUp at 0 when move to the upper layer; add Wd x d~{v) to widthUp 
each time a node v is assigned to the current layer ; 

— Do not change widthUp when move to the upper layer; add Wd x (d~(v) — 
d~^{v)) to widthUp each time a node v is assigned to the current layer . 

The first of the two modes builds up widthUp starting from zero 0 and taking 
into account only dummy nodes along edges between V\U and the current layer. 
We employ this update mode with Ai. The second mode approximates the width 
of the upper layers more precisely by keeping track of as many dummy nodes 
as possible. We employ it with A2 and A3 where the width of the upper layers 
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plays more important role. We consider the three alternatives Ai, A 2 , and A 3 
with the corresponding widthUp update modes as parallel branches in the rough 
version of MinWidth and we choose one of them as a result of our experimental 
work. 

In order to specify ConditionGoUp we need to set UBW and c. To specify 
ConditionSelect we need to select one of Ai, A 2 and A 3 . We propose to run 
MinWidth for 5911 test DAGs and various sets of values of UBW and c as well 
as for each of the alternatives Ai, A 2 or A 3 with the corresponding widthUp 
update mode. We expect that the extensive experiments will suggest the most 
appropriate values or ranges of values for UBW and c as well as the winner among 
the alternatives Ai, A2 or A3. 

4 Parameter Study 

In our experimental work we used 5911 DAGs from the well-known Rome graph 
dataset [5]. The Rome graphs come from practical applications. They are graphs 
with node count between 10 and 100 nodes and typically each of them has twice 
as many edges as nodes. We run MinWidth with each of the three alternatives Ai, 
A 2 , and A3, for each of the 5911 DAGs and for each pair (UBW, c) with UBW = 1..50, 
and c = 1..10. In total, we had about 9 million tasks. We executed the tasks in 
a computational grid environment with two computational nodes. One of the 
computational nodes was a PG with a Pentium III/800 MHz processor, and the 
other was a PG with a Pentium 4/2.4 GHz processor. 

4.1 Ai, A2, and A3 Compared 

For each of the 5911 input DAGs and each alternative - Ai, A2, and A3 - we chose 
the layering with the smallest width (taking into account the dummy nodes) and 
stored the pair of parameters (UBW, c) for which it was achieved. As we stated 
above, we explored any combination of UBW with = 1..50, and c = 1..10. 

Figures 3-8 compare various properties of the stored layerings. The a;-axis in 
all pictures represents the number of original nodes in a graph. Since the Rome 
graphs have no node labels we assume that the width of all original and all 
dummy nodes is 1 unit if not specified otherwise. Thus, the layering width is the 
maximum number of nodes (original and dummy) per layer. We have partitioned 
all DAGs into groups by node count. Each group covers an interval of size 5 on 
the x-axis. We display the average result for each group. 

Figures 3(a) and (b) compare the width of the layerings taking into account 
the dummy nodes (i.e. each dummy node has width equal to one unit) and 
neglecting them (i.e. each dummy node has width equal to zero) respectively. 
In both cases Ai gives the narrowest layerings which suggests that Ai might be 
the best option if the width of the dummy nodes is considered less than or equal 
to one unit (which is a reasonable assumption). The height of the Ai layerings 
(see Figure 4) is larger than the height of the other layerings. The height is the 
number of layers. It was expected that the narrower a layering, the larger is the 
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Node Count 




(a) 



(b) 



Fig. 3. Ai, A2, and A3 compared: layering width (a) taking into account and (b) 
neglecting the contribution of the dnmmy nodes. 



40 
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Node Count 



Fig. 4. Ai, A2, and A3 compared: layering height (number of layers). 



number layers. Figure 5(a) shows the dummy node count divided by the total 
node count in a DAG. Figure 5(b) shows the edge density divided by the total 
edge count in a DAG. We can observe that the Ai layerings have fewer dummy 
nodes and in general better edge density than the A 2 and the A3 layerings. 

Similarly, Figures 6(a) and (b) show the values of UBW and c which lead 
to narrowest layerings. The simplest Ai alternative finds narrowest layerings for 
considerably lower values of UBW and c than A 2 and A3. Moreover, those values of 
UBW and c do not depend on the DAG size when Ai is employed. The conclusion 
that we can make from these experiments is that the simplest alternative, A\, 
is superior to the other two. It is enough to run MinWidth with Ai, UBW = 1..4 
and c = 1..2 in order to achieve the narrowest possible layerings. 

In any case MinWidth leads to layerings with a very high dummy node count. 
There is a simple heuristic that can be applied to a layering in order to reduce 
the dummy node count. It is the Promotion heuristic which works by itera- 
tively moving (or promoting) nodes to upper layers if that movement decreases 
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(a) (b) 

Fig. 5. Ai, A 2 , and A 3 compared: normalized values of (a) the dummy node count and 
(b) the edge density. 




(a) 



Fig. 6. Ai, A 2 , and A 3 compared: values 
layering was found. 
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(b) 

(a) UBW and (b) c for which a narrowest 



the dummy node count [9]. The Promotion heuristic leads to close to the mini- 
mum dummy node count when applied to longest-path layerings. Since MinWidth 
is based on the longest-path algorithm we expected that the same Promotion 
heuristic might be successfully applied to MinWidth layerings as well. In the next 
section we compare MinWidth with Ai followed by the Promotion heuristic to 
some well-known layering algorithms. 



4.2 Effect of Promotion 

Figures 6(a) and (b) suggest that when Ai is employed it is enough to consider 
UBW = 1..4 and c = 1..2. Since MinWidth is very fast with fixed UBW and c, we 
can afford running it for relatively narrow ranges of UBW and c values for better 
quality results. Thus, in a new series of experiments we run MinWidth with Ai 
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for UBW = 1..4 and c = 1..2, and choose the combination (UBW,c) that leads 
to the narrowest layering. For convenience, we call the layering achieved by this 
method simply MinWidth layering in the remainder of this section. 

We post-processed MinWidth layerings by applying to them the Promotion 
heuristic modified to perform a node promotion only if it does not increase the 
width of the layering. 

We also run the longest-path algorithm and the Coffman-Graham algorithm 
followed by the same width-preserving node promotion. The Coffman-Graham 
algorithm takes an upper bound m on the number of nodes in a layer as an 
input parameter [4]. Thus, we run it for m = l..n, where n is the number of 
nodes in the DAG, and chose the narrowest layering. We also run the network 
simplex algorithm of Gansner et al. [7] and compared the aesthetic properties of 
the four layering types: MinWidth, longest-path, Coffman-Graham and Gansner’s 
network simplex. The results of the comparison are presented in Figures 7-10. 




Node Count 




Node Count 



(a) 



(b) 



Fig. 7. Effect of promotion: layering width (a) taking into account and (b) neglecting 
the contribution of the dummy nodes. 



It can be observed that the promotion heuristic is very efficient when applied 
after MinWidth. MinWidth leads to considerably narrower but taller layerings 
than the other three algorithms (see Figures 7(a) and (b)). It was expected that 
the narrower a layering, the larger is the number of layers. This can be confirmed 
in Figure 10(a). 

The number of dummy nodes in the MinWidth layerings is close to the number 
of dummy nodes in the Coffman-Graham layerings and slightly higher than the 
number of dummy nodes in the longest-path and Gansner’s layerings as it can 
be seen in Figure 8(a). However, Figure 8(b) shows that the MinWidth layerings 
have considerably lower edge density than the other layerings which means that 
they could possibly lead to clean drawings with small number of edge crossings. 
The number of edge crossings is widely accepted as one of the most important 
graph drawing aesthetic criteria [10]. 
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(a) 



(b) 



Fig. 8. Effect of promotion: normalized values of (a) the dummy node count and (b) 
the edge density. 




(a) MinWidth (b) Gansner’s 

Fig. 9. Two layerings of the same DAG. The MinWidth layering is narrower than the 
Gansner’s layering (assuming all DAG nodes and all dummy nodes have width one 
unit). All edges point downwards. 



Figure 9 shows an example of the MinWidth layering of a DAG compared to 
the Gansner’s layering of the same DAG. The DAG is taken from the Rome’s 
graph dataset. 

We run the second group of experiments on a single Pentium 4/2.4 GHz 
processor. The running times are presented in Figure 10(b). We observed that 
the average running time for MinWidth followed by promotion is up to 2 seconds 
for DAGs having no more than 75 nodes and it grows up to 6.2 seconds for 
DAGs with more than 75 and less than 100 nodes. The total running time 
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for the Coffman-Graham algorithm was within 3 seconds and the longest-path 
algorithms was the fastest of the three with running time within 2 seconds. The 
Gansner’s layerings (which we computed with ILOG CPLEX) are the fastest to 
be computed. 





(a) 



(b) 



Fig. 10. Effect of promotion: (a) layering height and (b) running times in seconds. 



5 Conclusions 

Our parameter study shows that MinWidth with Ai, UBW = 1..4, c = 1..2, and 
followed by width-preserving node promotion can be successfully employed as 
a heuristic for layering with the minimum width taking into account the con- 
tribution of the dummy nodes. This is the first successful attempt to design a 
heuristic for the NP-hard problem of minimum-width DAG layering with consid- 
eration of dummy nodes. It does not guarantee the minimum width but performs 
significantly faster than the only other alternative which is the exponential-time 
branch-and-cut algorithm of Healy and Nikolov. 

The aesthetic properties of the MinWidth layerings compare well to the prop- 
erties of the layerings constructed by the well-known layering algorithms. The 
MinWidth layerings have the lowest edge density which suggests that they could 
lead to clear and easy to comprehend drawings in the context of the STT method 
for hierarchical graph drawing. It has to be noted that the promotion heuris- 
tic slows down the computation significantly but the running time is still very 
acceptable for DAGs with up to 100 nodes. 

The work we present can be continued by exploring other possibilities for 
the conditions we set up in MinWidth. However, we believe that MinWidth finds 
layerings which are narrow enough for practical applications. Further research 
could be related to the optimization of the running time of MinWidth and to 
experiments with larger DAGs and with DAGs with variable node widths. 
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